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Abstract 

Motivated by linear network coding, communication channels perform linear operation over finite fields, namely 
linear operator channels (LOCs), are studied in this paper. For such a channel, its output vector is a linear transform 
of its input vector, and the transformation matrix is randomly and independently generated. The transformation matrix 
is assumed to remain constant for every T input vectors and to be unknown to both the transmitter and the receiver. 
There are NO constraints on the distribution of the transformation matrix and the field size. 

Specifically, the optimality of subspace coding over LOCs is investigated. A lower bound on the maximum 
achievable rate of subspace coding is obtained and it is shown to be tight for some cases. The maximum achievable rate 
of constant-dimensional subspace coding is characterized and the loss of rate incurred by using constant-dimensional 
subspace coding is insignificant. 

The maximum achievable rate of channel training is close to the lower bound on the maximum achievable rate of 
subspace coding. Two coding approaches based on channel training are proposed and their performances are evaluated. 
Our first approach makes use of rank-metric codes and its optimality depends on the existence of maximum rank 
distance codes. Our second approach applies linear coding and it can achieve the maximum achievable rate of channel 
training. Our code designs require only the knowledge of the expectation of the rank of the transformation matrix. 
The second scheme can also be realized ratelessly without a priori knowledge of the channel statistics. 

Index Terms 

linear operator channel, linear network coding, subspace coding, channel training 
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I. Introduction 

Let F be a finite field with q elements. A linear operator channel (LOC) with input X € F TxM and output 
Y e ¥ TxN is given by 

Y = XH, (1) 

where H is called the transformation matrix. 

Our motivation to study LOCs comes from linear network coding, a research topic that has drawn extensive 
interest in the past ten years. Linear network coding is a network transmission technique that can achieve the 
capacity of multicasting in communication networks |Q]-||4). Different from routing, linear network coding allows 
network nodes to relay new packets generated by linear combinations. The point-to-point transmission of a network 
employing linear network coding is given by a LOC, where H is the model of network transfer matrix and depends 
on the network topology 0, 0. 

A recent research topic where LOCs have found applications is the deterministic model of wireless networks [5|, 
|6|. This deterministic model provides a good approximation of certain wireless network behaviors and has shown 
its impact on the study of wireless networks. When employing linear operations in intermediate network nodes, the 
point-to-point transmission of the deterministic model of wireless networks is also given by a LOC 0,0. 

Even though some aspects of LOCs have been well studied in linear network coding, our understanding of LOCs 
is far from enough. In fact, the only case that LOCs are completely understood is that H has a constant rank M, 
However, H in general can have rank deficiency (i.e., xk(H) < M) due to the change of network topology, link 
failure, packet loss, and so on. Even without these network related dynamics, H has a random rank when random 
linear network coding is applied where new packets are generated by random linear combinations. Towards more 
sophisticated applications of linear network coding, a systematic study of LOCs becomes necessary. In this work, 
we study the information theoretic communication limits of LOCs with a general distribution of H and discuss 
coding for LOCs. 

A. Some Related Works 

We review some works of linear network coding that related to our discussions. 

When both the transmitter and the receiver know the instances of H, the transmission through a LOC is called 
the coherent transmission. For a network with fixed and known topology, linear network codes can be designed 
deterministically in polynomial time Q]. The transmission through such a network is usually assumed to be coherent. 
For the coherent transmission, the rank of H determines the capability of information transmission and it is bounded 
by the maximum flow form the transmitter to the receiver 0, 0, 0, 0. 

In communication networks where the network topology is dynamic and/or unknown, e.g., wireless communica- 
tion networks, deterministic design of network coding is difficult to realize. Random linear network coding is an 
efficient approach to apply network coding in such communication networks ll9ll- lfT3ll . The transformation matrix 
of a communication network employing random linear network coding, called a random linear coding network 
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(RLCN), is a random matrix and its instances are assumed to be unknown in both the transmitter and the receiver. 
Such a kind of transmission is referred to as the noncoherent transmission. The existing works on the noncoherent 
transmission of RLCN considers several special distributions of H. 

In various models and applications of random linear network coding 0, lfT4l - |[T7l . H is assumed to be an 
invertible square matrijiQ. This assumption is based on the fact that when H is a square matrix, i.e., M = N, it 
is full rank with high probability if i) M is less than or equal to the maximum flow from the transmitter to the 
receiver, and ii) the field size for network coding is sufficiently large comparing with the number of network nodes 
[9 1, [ 18 1. However, random linear network coding with small finite fields is attractive for low computing complexity. 
For example, wireless sensor networks is characterized by large network size and limited computing capability of 
network nodes. Using large finite field operations in sensors may not be a good choice. Moreover, the maximum 
flow varies due to the dynamic of wireless networks. For these reasons, full rank transformation matrices cannot 
be assumed in many applications. 

Kotter and Kschischang [19] introduced a model of random linear network coding, called Kotter-Kschischang 
operator channel (or KK operator channel), that takes vector spaces as input and output, and commits fixed dimension 
erasures and additive errors. Their model considers a special kind of rank-deficiency of H that gives fixed dimension 
erasures, defined as the difference of the dimension of the output and input vector spaces. They introduced subspace 
coding for random linear network coding that can be used to correct erasures, defined as the rank difference between 
the output and input matrices, as well as additive errors [19|. Silva et al. [20] constructed (unit-block) subspace 
codes using rank-metric codes 1211 . called unit-block lifted rank-metric codes here, which are nearly optimal in 
terms of achieving a Singleton type bound of (unit-block) subspace codes |19|. The coding scheme proposed by 
Ho et al. (9j for random linear network coding is a special case of unit-length lifted rank-metric codes for the 
transmission without erasures and errors. 

Jafari et al. Il22l . ||231 studied H containing uniformly i.i.d. components — such a matrix is called a purely 
random matrix. However, there is no rigorous justification of why purely random matrices can reflect the properties 
of general random linear network coding. Moreover, the problem-specific techniques used to analyze purely matrices 
are difficult to be extended to the general cases. 

B. Summary of Our Work 

In this paper, we study LOCs without any constraints on the distribution of H. The purely random transformation 
matrix and the invertible transformation matrix are special cases in our problem. We allow the transformation matrix 
has arbitrary rank and contains correlated components. We do not assume large finite fields to guarantee that the 
rank of H is full rank with high probability. We mainly consider the noncoherent transmission of LOCs by assuming 
the instances of H is unknown in both the transmitter and the receiver. 

'More generally, the assumption is that H has rank M, which implies N > M. 
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Our results can be applied to (random) linear network coding in both wireless and wireline networks without 
constraints on the network topology and the field size, as long as the input and output of the network can be 
modelled by a LOC. For example, link failures and packets losses, which do not change the linear relation between 
the input and output, can be taken into consideration. But the network transformation can also suffers from random 
errors and malicious modifications, for which we have to model the network transformation as 

Y = XH + Z, 

and there is no equivalent way to model it as a LOC. We do not consider nonzero Z as discussed in fl4l . fl5l . 

m. 

Our results are summarized as follows. 

We generalize the concept of subspace coding in [19] to multiple usages of a LOC and study its achievable rates. 
Let C be the capacity of a LOC and let Css be the maximum achievable rate of subspace coding for a LOC. We 
obtain that (1 - M/T) E[rk(if)] + e(T, q) < C ss < C < E[rk(ff)], where E[rk(i?)] is the expectation of the rank 
of H and < e(T, q) < 1.8/(Tlog 2 q). Moreover, we show that Css = C for uniform LOCs, a class of LOCs that 
includes the purely random transformation matrix and the invertible transformation matrix studied in fl5l . Il22ll . 
EH. 

An unknown transformation matrix is regular if its rank can take any value from zero to M. A LOC is regular if 
its transformation matrix is regular. For regular LOCs with sufficiently large T, we prove that the lower bound on 
Css is tight, and Css is achieved by the A/-dimensional subspace coding. For example, a purely random H with 
M < N is uniform and regular. Thus M -dimensional subspace coding achieves its capacity when T is sufficiently 
large. 

Moreover, Css can be well approximated by subspace codes using subspaces with the same dimension, called 
constant-dimensional subspace codes. Let Cc-ss be the maximum achievable rate of constant-dimensional subspace 
coding. We show that Css ~ Cc-ss < (log 2 min{M, N})/(T\og 2 q), which is much smaller than Css for practical 
channel parameters. For general LOCs, we find the optimal dimension r* such that there exists an r* -dimensional 
subspace code achieving Cc-ss- Taking the LOCs with an invertible H as an example, M is the optimal dimension 
when T > 2M + 1. 

Channel training is a coding scheme for LOCs that uses parts of its input matrix to recover the instance of H. 
The maximum achievable rate of using channel training Cct is (1 — M/T) E[rk(iJ)], which is very close to the 
lower bound of Css- We further proposed extended channel training codes to reducing the overhead of channel 
training codes. We give upper and lower bounds on the maximum achievable rate of extended channel training 
codes and show the gap between bounds is small. 

The coding scheme proposed by Ho et al. and the unit-block lifted rank-metric codes proposed by Silva et 
al. ll20l fall in the class of channel training. We show that unit-block lifted rank-metric codes can achieve Cct 
only when H has a constant rank. If H have an arbitrary rank, the maximum achievable rate of unit-block lifted 
rank-metric codes is demonstrated to be far from Cct for certain rank distribution of H. 
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To achieve Cct, we consider two coding schemes. In the first scheme, we extend the method of Silva et al. 11201 
to construct codes for LOCs by multiple uses of the channel. The constructed code is called lifted rank-metric code. 
The optimality of lifted rank-metric codes, in the sense of achieving Cct, depends on the existence of the maximum- 
rank-distance (MRD) codes in classical algebraic coding theory, which was first studied in ||2T1 . Specifically, we 
show that if T 3> M, lifted rank-metric codes can approximately approach Cct- Otherwise, since the existence of 
MRD codes is unclear, it is uncertain if lifted rank- metric codes can achieve Cct- Exsiting decoding algorithms 
of rank-metric codes can be applied to lifted rank-metric codes. The decoding complexity is given by 0(n 2 ) field 
operations in F, where n is the block length of the codes. 

We further propose a class of codes called lifted linear matrix codes, which can achieve Cct for all T > M . 
We show that with probability more than half, a randomly choosen generator matrix gives good performance. We 
obain the error exponent of decoding lifted linear matrix codes. The decoding of a lifted linear matrix code has 
complexity given by 0(n?) field operations when applying Gaussian elimination. Lifted linear matrix codes can 
be realized ratelessly if the channel has a neglectable rate of feedback. 

Both lifted rank-metric codes and lifted linear matrix codes are universal in the sense that i) only the knowledge 
of E[rk(Zf)] is required to design codes and ii) a code has similar performance for all LOCs with the same E[rk(i?)]. 
Furthermore, rateless lifted linear matrix codes do not require any priori knowledge of channel statistics. 

C. Organization 

This paper also provides a general framework to study LOCs. Some notations and mathematical results that 
are used in our discussion, including some counting problems related to projective spaces, are introduced in 
Self-contained proofs of these counting problems are given in Appendix lAl In pill linear operator channels are 
formally defined, and coherent and noncoherent transmission of LOCs are discussed. In EUVI we give the maximum 
achievable rate of a noncoherent transmission scheme: channel training and study the bounds on the maximum 
achievable rate of extended channel training. In SjV] we reveal an intrinsic symmetric property of LOCs that holds 
for any distribution of the transformation matrix. These symmetric properties can help to determine the capacity- 
achieving input distributions of LOCs. In WII and Will we study subspace coding. From Willi to SjX] two coding 
approaches for LOCs are introduced. The last section contains the concluding remarks. 

II. Preliminaries 

Let F be the finite field with q elements, F* be the i-dimensional vector space over F, and F* xm be the set of all 
t x to matrices over F. For a matrix X, let rk(X) be its rank, let X T be its transpose, and let (X) be its column 
space, the subspace spanned by the column vectors of X. Similarly, the row space of X is denoted by (X T ). If V 
is a subspace of U, we write V < U. 

The projective space Pj(F') is the collection of all subspaces of F*. Let Pj(m,F') be the subset of Pj(F') that 
contains all the subspaces with dimension less than or equal to to. This paper involves some counting problems 



April 15, 2010 



DRAFT 



6 



in projective space, which are discussed in Appendix lAl Let Fr(F mxr ) be the set of full rank matrices in F" ixr . 
Define 

m _ J (q m -l)(q m -<?)••• (q m -q r - 1 ) r>0 

Xr ~ \ \A) 

[ 1 r = 

for r < to. By Lemma \W\ | Fr(F mxr )| = xf- Define 

C=X?q~ mr - (3) 

Since the number of m x r matrices is q mr , £™ 1 can be regarded as the probability that a randomly chosen to x r 
matrix is full rank (ref. Lemma ITTb. The Grassmannian Gr(r, F') is the set of all r-dimensional subspaces of F*. 
Thus Pj(m,F*) = Ur<m Gr(r, F*). The Gaussian binomials are defined as 




v 

A r 
Xr 



By LemmaQH (*) = |Gr(r,F*)|. Let 



m n 
Xr 

which is the number of to x n matrices with rank r (see Lemma [T3l). 

For a discrete random variable (RV) X, we use px to denote its probability mass function (PMF). Let X and 
Y be RVs over discrete alphabets X and y, respectively. We write a transition probability (matrix) from X to y 
as Py |x(X|Y), X G A 7 and Y € J. When the context is clear, we may omit the subscript of px and Py|x to 
simplify the notations. 

III. Linear Operator Channels 

A. Formulations 

We first introduce a vector formulation of LOCs which reveals more details than the one given in (HJ. Let T, 
M and TV be nonnegative integers. A linear operator channel takes an M-dimensional vector as input and an 
iV-dimensional vector as output. The zth input xi G W lxM and the ith output yi eF lx " are related by 

where Hi is a random matrix over F MxJv . We consider that Hi keeps constant for T consecutive input vectors, 
i.e., 

H n T+i = H n T+2 = ■ ■ ■ = H n T+T, n = 0, 1, 2, • • • ; 

and H n T+i, n = 0, 1, • • •, are independent and follow the same generic distribution of random variable H. By 
considering T consecutive inputs/outputs as a matrix, we have the matrix formulation given in ([T). Here, T is 
called the inaction period; M x N is called the dimension of the LOC. A LOC with transformation matrix H and 
inaction period T is denoted by LOC (H,T). Unless otherwise specified, we use the capital letters M and for 
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Fig. 1. A directed network with the source node s and the sink node t. Each edge in the network is a communication link that can transmit a 
symbol from F without error. Node a and b are rely nodes that apply linear network coding. The transmitted symbols through links are labeled. 



H = 



(5) 



the dimension of LOC(H, T). We will use the matrix formulation of the LOCs in this paper exclusively. When we 
talk about one use of LOC (H,T), we mean the channel transmits one T x M matrix. 

A communication network employing linear network coding can be modeled by a LOC. For example, when 
applying linear network coding in relay nodes, the transformation matrix of the network in Fig. Q]is 

OL\ CC2f3l 

o ft 

in which a%, ot-z-, Pi and are linear combination coefficients taking value in F. These coefficients can be fixed or 
random depending on the linear network coding approach. For a given network topology, the general formulation 
of the transformation matrix of linear network coding can be found in (3). 

For wireless networks without a centralized control, the transmission of network nodes is spontaneous and the 
network topology is dynamic. When employing random linear network coding, the inputs and the outputs of a 
wireless network still have linear relations [16q but the formulation of the transformation matrix is difficult to 
obtain. The instances of the transformation matrix of random network coding is usually assumed to be unknown 
in both the transmitter and the receiver. We will mainly discuss this kind of transmission of LOCs (see ijlll-Cl l. 

The transmission of random linear network coding is packetized. The source node organizes its data into M 
packages, called a batch, and each of which contains T symbols from F. Network nodes perform linear network 
coding among the symbols in the same position of the packages in one batch, and the coding for all the positions 
are the same. This packetized transmission matches our assumption that the transformation matrix keeps constant 
for T consecutive input vectors. For this reason, the inaction period is also called the packet length. The sink node 
try to collect TV (usually, N > M) packages in this batch to decode the original packages. This gives a physical 
meaning of the dimension of LOCs. 

B. Coherent Transmission of LOCs 

We call the instances of the transformation matrix the channel information (CI). The transmission with known 
CI at both the transmitter and the receiver is called coherent transmission. When the instance of H is H, the 

2 We do not consider the encoding of packages with errors 
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maximum achievable rate of coherent transmission is max px I(X: Y\H = H). Thus, the maximum achievable rate 
of coherent transmission (also called the coherent capacity) is 

C C0 (H, T) = Vp fl (H) max/(X; Y\H = H). 

H 

Unless otherwise specified, we use a base-2 logarithm in this paper so that C co (H , T) has a bit unit. 

Similar to coherent transmission, we can consider the transmission with CI only available at the receiver. We 
also assume that X and H are independent — this assumption is consistent with the transmitter does not know the 
instances of H. The maximum achievable rate of such transmission is 

C R<J (H,T) = m&xI(X;YH). 

px 

A random matrix is purely random if it has uniformly independent components. 

Proposition 1: Cr.qi(H,T) — C co (H,T) — T log 2 q E[rk(i/)] and both capacities are achieved by the purely 
random input distribution. 

Proof: We first consider the coherent transmission. We know 

I(X; Y\H = H) = H(Y\H = H) H(Y\X, H = H) 

= H(Y\H = H). 

Let Xi and yi be the ith rows of X and Y, respectively. Since j/j = xiR, i.e., tji is a vector in the subspace spanned 
by the row vectors of H, 

H{ Vi \H = H) < lo g2 q A W = rk(H ) log 2 q, 
in which the equality is achieved when Xi contains uniformly independent components. Hence, 

T 

H(Y\H = H) <Y,H{ yi \H = H) 

i=l 

<rk(H)Tlog 2(Z , 

where the first equality is achieved when x i7 i = 1, • ■ • , T, are independent. Therefore, 

C C0 (H,T) = yWH)max/(X;F|ff = H) 

H 

= ^^(H)rk(H)Tlog 2<Z 

H 

= E[rk(^)]Tlog 2 9. 

Now we consider the transmission with CI only available at the receiver. We know 

I(X; YH) = I{X; Y\H) + I(X; H) 
= I(X;Y\H) 
= H(Y\H) -H(Y\XH) 
= H(Y\H), 
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in which I(X; H) = since X and H are independent. Similar to the coherent case, 

H(Y\H) =Y,Ph{K)H{Y\H = H) 

H 

T 

<^^ Pfl (H)%|ff = H) 

i=l H 
T 

^EE^( H ) rk ( H ) lo g2« 

4=1 H 

= E[rk(tf)]Tlog 2 <z, 

where the equality is achieved by X with uniformly independent components. ■ 
Remark: Note that we do not assume X and H are independent for coherent transmission. In fact for coherent 
transmission, the transmitter can use its knowledge of H in encoding. Without lose of generality, we assume that the 
first rk(H) rows of H are linearly independent. So the transmitter can encode its information in an M-dimensional 
vector which contains only nonzero values in its first rk(H) components. The receiver can decode these nonzero 
values by solving a linear system of equations. Such scheme has transmission rate rk(H)T log 2 q, which achieves 
the coherent capacity. The coding that achieves E[rk(i7)]Tlog 2 q with CI only available at the receiver, discussed 
in ^VIIIl is more involved. 

C. Noncoherent Transmission of LOCs 

The transmission without the knowledge of CI in both the transmitter and the receiver is called noncoherent 
transmission. Same to the case with CI only available at the receiver, we assume that H and X are independent 
for noncoherent transmission. Under this assumption, 

Pxy(X,Y) =Pr{X = X,Y = Y} 

= Pr{X = X,Xff = Y} 

= Pr{X = X} Pr{XiJ = Y}. 

Thus, the transition probability Py|x(Y|X) of noncoherent transmission is given by 

Py| X (Y|X) = Pr{X£T = Y}. (6) 

Unless otherwise specified, we consider noncoherent transmission of LOCs in the rest of this paper. For nonco- 
herent transmission, a LOC is a discrete memoryless channel (DMC). The (noncoherent) capacity of LOC(iT, T) 
is 

C(H,T) = max I(X;Y). 

Px 

We also consider the normalized channel capacity 

r (H r n _ c(H,T) 
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When we talk about the normalization of a coding rate, we mean to normalize by T log 2 q. 

Achieving the capacity generally involves multiple usages of the channel. A block code for LOC(iJ, T) is a subset 
of (F Tx ) n , the nth Cartesian power of F TxM . Here n is the length of the block code. Since the components of 
codewords are matrices, such a code is called a matrix code. The channel capacity of a LOC can be approached 
using a sequence of matrix codes with n — > oo. 

In the following subsection, we give the channel capacity and the capacity achieving inputs of three LOCs. 
These examples show that finding the channel capacity is problem-specific. In general, it is not easy to accurately 
characterize the (noncoherent) capacity of a LOC. Since an input distribution contains q™ probability masses, 
a general method to maximize a mutual information, e.g., the Blahut-Arimoto algorithm, has time complexity 
0(q™). Moreover, the distribution of the transformation matrix is difficult to obtain in applications like random 
linear network coding. Therefore, our goal is to find an efficient method to approach the capacity of LOCs with 
limited channel statistics. 

D. Examples of Linear Operator Channels 

1) Z-Channel: A Z-channel with crossover probability p is a binary-input-binary-output channel that flips the 
input bit 1 with probability p, but maps input bit to with probability 1. A Z-channel is a LOC over binary field 
given by 

V = xh, 

where Pr{h = 0} = p. We know the capacity of a Z-channel is C(h, 1) = log 2 (l + (1 — y>)p p /( 1 ~ p )), which is 
achieved by 

l-pV(i-p) 
Px{0) = l + (l-p)pP/d-P)- 

2) Full Rank Transformation Matrix: Let Hf u \\ be the random matrix uniformly distributed over Fr(F MxAr ), 
M < N. For LOC(F fu ii,T), 



P Y \x(Y\X) = 




00 = (x) 

O.W.. 



This kind of transformation matrix with M — N has been studied in ifTSI . Let M* = min{A/, T}. We know 

C(H m ,T) =log 2 J2 (r) 9 > 

r<M* 

where £ r < M , (J) q = | Pj(M*,F T )|. Any input p x satisfying 

p ^ u) = \mk^Y\ ' WePj(r ' FT) ' 

is capacity achieving. In other words, this capacity is achieved by using each subspace in Pj(M*,F T ) uniformly. 
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iV|x(Y|X) 



3) Purely Random Transformation Matrix: Recall that a random matrix is called purely random if it contains 
uniformly independent components. Consider LOC(i/ pure , T) with purely random -ff pure and dimension M x N. We 
have 

' q -N±(x) ^y) C (X) 
o.w.. 

Such channels were studied in 11221 . 11231 . where the capacity formulas, involving big-0 notations, are obtained for 
different cases. We will give an exact formula for sufficiently large T, 

T 

l0 S2 ~w 



^rk(ff) 



C (flpure > T) = E 

This capacity is achieved by an input px with 

/ l/x T M rk(X)=M 
I o.w.. 

In other words, this capacity is achieved by using all the full rank T x M matrices with equal probability. 



IV. Channel Training 

In noncoherent transmission, the CI is not available in either the transmitter or the receiver. But we can deliver 
the CI to the receiver using a simple channel training technique. When T > M, we can transmit an identity M x M 
matrix as a submatrix of X to recover H at the receiver. For example, if 

I 



X 



X' 



then 



Y = XH = 



H 
X'H 



The first M rows of Y gives the instance of H. Thus the last T — M rows of Y can be decoded with the CI. Let 
Cct be the maximum achievable rate of such a scheme, and (7 CX be its normalization. 

Proposition 2: For LOC(H,T) with dimension M X N and T > M, C C t = (1 - M/T)E[fk(H)]. 

Proof: Let X be a random matrix over F( T ~ M ) xM and let Y = XH. If the input of LOC(ff, T) is X = 





I 




H 


the output is Y = 




H = 






X 




Y 



Thus, 



C C T = max/pr ; y)/(Tlog 2 q) 

Px 

= max I(X; YH)/(T log, q). 

Px 
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Since X and H are independent, we have 

I(X;YH)=I(X;Y\H) 
= H(Y\H) 

<E[rk(H)](T-M)log 2 q, 

where the equality is achieved by X with uniformly independent components. ■ 

Remark: In this formula of Ccr(H,T), M/T is just the ratio of the overhead used in channel training. 

Corollary 1: (1 - M/T) E[rk(ff)] < C{H,T) < E[rk(iJ)]. 

Proof: It follows from G CT (#, T) < C{H, T) < G R _ C i(#, T). ■ 

The upper bound and the lower bound is asymptotically tight when T is large. We will further improve the lower 
bound by showing that the inequality is strict. 

Now we consider how to improve Ccr(H, T) by reducing the overhead ratio M/T. The method is to apply channel 
training to the new channel LOC(G_ff, T) for a random matrix G with dimension r x M. See that Cct(GH, T) = 
(1 - r/T) E[rk(GH)] < (1 - r/T) E[rk(if)]. Thus, to achieve higher rate than (1 - M/T) E[rk(#)], we only need 
to consider r < M. We call this method extended channel training. The maximum achievable rate of extended 
channel training is 

C EC t(H, T) = max sup (1 - r/T) E[rk(GJT)]. 

r<Mp G .'pi le dimension of G is rxM 

Theorem 1: For LOC(H, T) with dimension M x N, we have 

C EC t(H,T) > max jmax(l - r/T) ( ^^(i)^ + r^p^WC" j ,Ccr(H,T)\ , 



\k— k—r 



and 

/r-l M 



Cect(H,T) <max(l- r/T) ^p rk(ff) (fc)fc + r 5> rk(ff) (fc) . 



r<M 

Proof: We have that 



/c— r 



E[rk(Gff)] = ^ sPr{rk(Gff) = s} 

s=0 
r M 

= E s E Pr{rk(Gi/) = s| rk(tf) - fc}p rk(H) (fc) 



s=0 fc= s 
M 

£ftk(H)W E «Pr{rk(G^) = s|rk(tf) = fc}. 

k— s<min{fc,r} 
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To prove the first inequality, we consider G is purely random. By Lemma [T3l Pr{rk(G-ff) = s\rk{H) = k} = 



C C Then 



E[rk(Gff)]=J> rkW (fc) £ s ; 



/■s a (k-s){r-s) 
k=0 s<roin{fe,r} ^ s 

= PMH) (k) S CSn (k S -s)(r-s) + Yl P^H) W Yl S t s n (k- S )(r-s) 
k=0 s<k Ssy k=r s<r 

r-1 M 

> J2 P MH)( k ) k Ck + Y P MH)( k Hr- 

k=0 k=r 

Therefore 

C E ct(H, T) = max jmax sup (1 - r/T) E[ik(GH)], (1 - M/T) E[rk(if)] 

\^r<M p G: xhe dimension of G is rxM 

= max (max(l - r/T) E[tk(GH)]\ G is pure i y ran dom, (1 - M/T) E[vk{GH)' 

r<M 



M 



> 



max \ max(l - r/T) [ Y,P±(H)(k)kC k + (*=Kr ] , (1 - M/T) E[rk(Off)] 



\fe=o 

To prove the second inequality, we see that 



E[rk(GH)] = J2 sPr i rk (GH) = s} 

r M 

= E S E Pr W Gi/ ) = S l rk ( G ) = fc }Prk(G)(fc) 
s— k—s 
r 

= ^p rk(G) (fc)E sPr { rk ( Gff ) = s i rk ( G ) = fc }- 



k=0 s<k 



By 



a Pr{rk(G£T) = s| rk(G) = fc} = ^ ^ Pr { rk ( GiJ ) = s l rk ( G ) = k ) 

r<k s>r 

= Pr{rk(GH) > r| rk(G) = fc} 

r<k 

< Pr{rk(iJ) > r} 

r<k 

= Y spH ^ + k Y PH ^ s ^ 



s<k r<k s>r 



s<k s>k 



we have 



E[rk(GiT)] = J2 p MG)(k) £ S p H (s) + fc^p H ( s ) 

fc=0 \s<fe s> 

< max ^ sp fl (s) + k ^ Ph{s) 



. s<k s>k 



The proof is completed. 
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Corollary 2: Let C^\H,T) and C$$?(H,T) be the upper bound and the lower bound of C E ct(H,T) in 
Theorem Q] respectively. When T is sufficiently large, 

M — r* 

C™?(H,T) - C^(H,T) < E [ik(JJ)]— ^— , 

where r* = max{r : p±( H )(r) > 0}. This means that if p A{H ){M) > 0, C^ r (H,T) = C^f(H,T) when T is 
sufficiently large. Fixing the rank distribution of H, we have 

lim tfS2?(H,T) = CpP p T er (if,T). 

q— too 

Proof: By the lower bound of C E ct(H,T), we have C l °^ r (H,T) > (1 - M/T) E[rk(H)]. Let 

s < r s > r 

Since a(r) = a(r*) for all r > r*, we only need to consider r < r* to find the upper bound of Cect(H^T), i.e., 

CS er (ff,T)=ma X (l-r/T)a(r). 

r<r* 

Fix r < r*. We know that a(r) < a(r*). Hence (1 - r/T)a(r) < (1 - M/T)a(M) when T > (r*a(M) - 
ra(r))/(a(M - a(r)). Therefore when T > max r<r . (r*a(M) - ra(r))/(a(M) - a(r)), C EC t(H,T) = (1 - 
r*/T) E[rk(if)]. 

The second part of this corollary follows from £™ — * 1 when q — > oo. ■ 
In Fig. |2] we illustrate the bounds of Cect(H,T) and Cct(H,T) over binary field. 

V. Symmetric Property and Optimal Input Distributions 

Here we introduce an intrinsic symmetric property of LOCs and show that this property is helpful to find an 
optimal input distribution of LOCs. 

A. Random Variables and Markov Chains Related to LOCs 

We introduce several RVs related to LOCs, which are used extensively in this paper. Let X be a RV over F txm . 
We denote by (X) as a RV over Pj(F') with 

P{x) (U)=Pt{(X) = U}= Yl P*W- (7) 

xeF f x>":(x)=;y 

Denote X T as a RV over F mxt with j> x t(X t ) = pxCX). Combining the above notations, (X T ) is a RV over 
Pj(F m ) with 

P(xt)(V) = Px(X). 
xeF** m :{x T )=v 

Furthermore, denote rk(X) as a RV with 

P*(x)(r)= J2 P*W- (8) 

X:rk(X)=r 
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Fig. 3. Random variables and Markov chains related to hOC(H, T). 



It is easy to see that rk(X) is a deterministic function of (X) ((X T )), and (X) ((X T )) is a deterministic function 
of X. 

Now we consider LOC(H, T) with dimension M x iV. Applying above definitions on the input X and the output 
Y, we obtain the RVs shown in Fig. [3] These RVs are given as the nodes of a directed graph. All the RVs in a 
directed path forms a Markov chain. For example, rk(X) — > (X) — > X — > Y — >• (Y) — > rk(Y) forms a Markov 
chain. Let r, U, X, Y, V and s be the instances of rk(X), (X), X, Y, (Y) and rk(Y), respectively. To verify this 
Markov chain, we only need to check the deterministic relations between these RVs: 

{p(X, Y) if (X) = U, dim(f7) = r, 
(Y) = V,dim(V) = S , 
o.w., 
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P {X )(U) if dim(IT) = r, 
o.w., 



P{Y) (V) if dim(V) = a, 
o.w;.. 



P±(x)(x)(r, U) ■■ 

and 

P<y>ik(y)(V,s) = 

Using the above relations, we are ready to see 

p(r, U, X, Y, V, S )p(U)p(X)p(Y)p(V) 
= P (r, U) P (U, X)p(X, Y)p(Y, V)p(V, s), 

which matches an alternative definition of Markov chain given in fl25l §2.1]. Other Markov chains shown in Fig. [3] 
can be verified accordingly. 

B. A Symmetric Property 

The next proposition states a symmetric property of LOCs. Even though its proof is straightforward, this 
proposition plays a fundamental role in this paper. We say a matrix is full column (row) rank if its rank is equal 
to its number of columns (rows). 

Proposition 3: Consider LOC(ff, T). For any matrix B with T rows and full column rank, 

Py| X (BE|BD) = Pt{DH = E}. 

Proof: We know 

Fy| X (BE|BD) = Pr{BDff = BE} 
= Pr{DiJ = E}, 

where the last equality follows because B is full column rank. ■ 
Let B be a t x r matrix with rank r. For a t x m matrix A with (A) C (B), define A/B be the matrix such 

that A = B(A/B). The notation "/" is well defined because i) there must exists C such that A = BC since 

(A) C (B) and ii) such C is unique since B is full column rank. 
Let X and Y be the input and output matrices of a LOC, respectively, with (Y) < (X) . Fix a full column rank 

matrix B with (X) = (B). Prop. [3] tells that 

P Y \x(Y\X) = Pr{(X/B)H = Y/B}. (9) 

The dimension of X/B is rk(X) x M and the dimension of Y/B is rk(X) x N. This means that the transition 
probability Py\x does not depends on the inaction period T. See examples in ilHI-DI In the following, we give 
two useful forms of this symmetric property. 

Corollary 3: Let X be an input matrix of LOC(H, T). Then, 

P rk(y) | X ( S |X) = P &{y)1{x t } (s\(X t )) = Pr{rk(Dff) = s}, 
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where D is any rk(X) x M matrix with (D T ) = (X T ). 

Proof: Fix a rk(X) x M matrix D with (X T ) = (D T ). Let B T = X T /D T . We know B is full column rank. 
Since X — » Y — » rk(Y~) forms a Markov chain, 

(y) | y ( s I Y) Py i x ( Y I X) 

Y 

= £ Py,x(Y|X) 

Y:rk(Y)=s 

= Pr{DP = Y/B} (10) 

Y:rk(Y)=s 

= ^ Pr { D # = E l 

E:rk(E)=s 

= Pr{rk(D#) = s}, 

where (IToT > follows from (|9). 

Let U = (X T ). By the Markov chain (X T ) -> X -> rk(F), 

- P rk(Y)|(X T )( s l^) 

= E ^k(Y)|x(s|X')PxKxT)(X'|[/) 

X':(X' T ) = C/ 

= Pr{rk(Dff) = s} £ P X | <X T)(X'|&) 

X':<X' T )=£/ 

= Pr{rk(DP0 = s}. 

The proof is completed. ■ 

Corollary 4: Consider LOC(H,T). For any $ e Fr(F TxT ), 

iV|x(*Y|$X)=iV|jr(Y|X). (11) 
Proof: This is a special cases of Prop. [3] ■ 

C. a-type Input Distributions 

For a DMC, a capacity achieving input is also referred to as an optimal input. It is well known that the channel 
capacity of a symmetric channel is achieved by the symmetric input distribution ll24l . Even though in general LOCs 
are not symmetric channels, the symmetric property we have shown is still helpful to find an optimal input. 

Definition 1: A PMF p over F TxM is a-type if p(X) = p(X') for all X, X' e F TxM with (X T ) = (X' T ). 

For example, the input distribution 



MX) 

is the a-type input with p±(x){M) = 1- 



1/X T M rk(X)=M 



£ A/ 

o.iu 
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Lemma 1: A function p : F TxM — > R is an a-type PMF if and only if it can be written as 

P (x) = g((x T ))/ Xr T k(x) (12) 

for certain PMF Q over Pj(min{M,T},F M ). 

Proof: Assume p is an a-type input. Define Q : Pj(min{M, T}, ¥ M ) — > R as 

Q(J7) = J] ^ X ')- 

X 'eF T * Jf :<x' T )=c/ 

For X G F TxM , 

g((X T )) = J] p(X') 

X'eF T >< M :(X' T )=(x T ) 

=p(x) E 1 

x'gf t >< m :(X' t )=<x t ) 
= ?>(X)x£(x)> 

where the last equality follows from Lemma [20] This proves the necessary condition. 

Now we prove the sufficient condition. Let Q be a PMF over Pj(min{M, T},¥ M ). Define a function p : F Tx A/ — > 

R as 

p(X)=Q«X T ))/ x £ (x) . 
We can check that for X, X' G F TxM with (X T ) = (X' T ), 

p(X) = Q«x T ))/ x £ (x) 
= g((X T ))/ Xr T k( x) 

= f(X'), 



and 

x c/ePj(F A/ ) X:<x T )=c/ 

= E QCffyxL^ E 1 

(7ePj(F M ) X:(X T ) = C/ 

= E w 

C/GPj(F") 
= 1. 

Thus p is an a-type PMF. ■ 
The following proposition tells that we can only consider a-type inputs to study the capacity of LOCs. 
Theorem 2: For a LOC there exists an a-type input that maximizes I(X;Y). 
Proof: This proposition is proved using Cor. [4] and the concavity of mutual information as a function of input 

distribution. See i lV-Dl for details. ■ 
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Let M* = min{T, M}. Theorem [2] narrows down the range to find an optimal input. To determine a PMF over 
Pj(M*,F M ), we have | Pj(M*, F M )| - 1 parameters to determine. We know | Pj(M*, F M )| - 1 < q ^/M°s q m+c^ 
where c < 1.8 is a constant (see LemmafTTIi. But to determine a PMF over F TxM , we have to fix q —1 parameters. 
It is clear that q M2 / 2 + lo s q M+cj^tm _!)_>.() when t ^ oo, or when T > M/2 + 1/e + c and q -> oo. Thus, 
using a-type inputs can significantly reduce the complexity to find an optimal input distribution when i) T is large 
or ii) T > M/2 + 1/e + c and q is large. 

D. Proof of Theorem [2] 

Lemma 2: Let p x be an input distribution of LOC(H,T) with dimension M x N. Define p' x : F TxM -> K 
as p' x (X) = px(£X), where $ e Fr(F TxT ). We have, i) is a PMF, ii) J(X;Y)| px = I(X;Y)\ p > x and iii) 
/((X);{F))| px =J((X};(y))|^. 

Proof: First is a PMF because < j/ x (X) = p($X) < 1 and 

E px(x)= ^ $x ) 

= E p( x ) 

xe« TxM 

= E ^( x ) 

xeF Tx " 
= 1. 

Let py and be the PMF of Y when the inputs are px and p' x , respectively. We have 

p' Y (Y)= E Px(X)Pk| X (Y|X) 

XEF T >= M 

^ E P(^X)Py| X ($Y|$X) 

E p(X')iV|x(*Y|X') 

X 'eF TxM 

= py(*Y). 

where (a) follows from Cor. [4] and p^-(X) = px(^X), and (b) follows by letting X' = $X. Therefore, 

I{X- Y) \ p . x = J2p'x (X) E P ( Y I X ) l0 § 2 ^r Y|X ' 



x 

(<0 



Py(Y) 



E^X)E^Y|$X)log 2 ^™ 



x 



X' Y 

= I(X;Y)\ px , 

where (c) follows from Cor. [4] 



EMxoE p ( Y 'i x ')iog 2 ^^ 
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The last equality in the lemma can be proved similarly. First, 

p\x)(u)= E ?4-(x) 

X:(X)=E/ 

= e ^( $x ) 

X:(X) = C7 

c => E 

X:(X)=#£7 
= P(X>(*^), 

where (d) follows from Lemma [20l Let ^V)|(x)(^ / I^) ^ e ^ transition probability when the input is p' x . For 
U <¥ T with p {x) (U) > 0, 

P{Y) l{ X)(V\U) 

_ Z)x,Y:(x)=t/,(Y)=v p Y\x(Y\X.)p x (X) 
Ex, Y :(x)=a,<Y>=y iV|x(*Y|*X) Px (*X) 



Hence, 



Therefore, 



= P (r) | W (W|$f/). 

p'(Y)(v) = T, p (Y)\(x)(y\u)p' {x) (u) 

u 

= ^2 P (Y)\(X)($V\$U) P{X) ($U) 

u 

I{{X);(Y))\ p , x 

u v Uy)\ v ) 

- E p(x) m E n*vw) io g2 

= /«x);(y»i PJt . 



Proof of Therein [2} Consider a LOC with inaction period T. Let p be an optimal input distribution for the 
channel. For $ G Fr(F TxT ), define as p*(X) = p($X). By Lemma p*(X) also achieves the capacity of 
the LOC. Define p* as 



p*(X) 



1 

|Fr(F TxT )| 



E ^*( x )- 



*eFr(F T x' r ) 
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By the concavity of the mutual information, we know p* is also an optimal input for the channel. 

Now we show that p* is a-type. Consider X, X' e F TxM with (X T ) = (X' T ). By Lemma [19] there exists 
$o e Fr(F TxT ) such that X' = $ X. We have 

P*(^X) = \ F Jt, T)1 E P*(*oX) 

1 V n *eFr(F T x T ) 

1 E p**°( x ) 



|Fr(F TxT )| 
= p*(X). 

where in the last equality we use Fr(F TxT ) = $ Fr(F TxT ). ■ 

VI. Subspace Coding for Linear Operator Channels 

Subspace coding was first proposed for noncoherent transmission of RLCNs. Here we generalize the idea to 
LOCs and study subspace coding from a general way. 

A. Subspace Degradation of LOCs 

In this section, we are interested in the Markov chain (X) — > X — > Y — > (Y). The transition probability from 
X to Y is given by ©. The transition probability from Y to (Y) is deterministic: 

1 (Y) = V 



o.w. 



P {y)1 y(V\Y) = 

Applying the property of Markov chain, we further know 

P {Y )\x(V\X) =J2P(y)\y(V\Y)P yix (Y\X) 

Y 

Y:(Y)=V 

The transition probability Px\(x) is undetermined for a LOC. 

Definition 2: Consider LOC(iJ, T) with transition probability Py\x- Given a transition probability Py|(x}> we 
have a new channel law given by 

P(Y)\(x)(V\U) = Y / P{Y)\x(V\X)P xl{x) (X\U) 

X 

= E E ^V|x(Y|X)P x|(x) (X|[/). (13) 

X:(X)=!7 Y:<Y)=y 

This channel, called a subspace degradation of LOC(iJ, T), takes subspaces as input and output. 

A subspace degradation of LOC(iT, T) is identified by Px\(x)- We take (X) and (Y) as the input and output of 
a subspace degradation, respectively. The mutual information between (X) and (Y) can be written as a function 
of P(x) an d -P(r)|(x)> m which Piy)\(x)i given in ( fT~3b , is a function of P x \{x) (X|C/). The capacity of a subspace 
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degradation of a LOC is max P(x) I((Y), (X)). Therefore, the maximum achievable rate of subspace degradations 
of LOC(H,T) is 

C SS (H,T) = max maxI((X); {¥)). 

PX\(X) P(X) 

The rate Css(H,T) is achievable since max P(x) I({X); (Y)) is achievable for any given Px\{x)- 
Lemma 3: For LOC(H, T), I((X); (Y)) is determined by px and we can write 

C ss (.ff,T)=max/((X>;(y>). 

Px 

Proof: For a fixed LOC, we know that I((X); (Y)) is determined by P(x) an d Px\(x)- We show that p^ X ){U) 
and P x \(x)O^W) appeared in I((X); (Y)) are determined by px- First, we obtain p/x) from px as shown in (Q. 
Second, since 

P x]{x) (X\U)p {x) (U) = Pr{X = X, (X) = U} 

Px (X) (X)=C7 



o.u>.. 



we have 



P X | W (X|[/) = «^ ^ "™ ,T W (14) 
(X) ^ [/. 

That means, for f7 withp(x)(t^) > 0, Px"|(x) (X|t/) is determined by px ■ Moreover, if p^ X ) (U) — 0, Px\(x)(X\U) 
does not appear in I((X); (Y)). Thus, I((X); (Y)) can be regarded as a function with only one variable px- This 
also implies that 

C SS (H,T) >max/«X);(y>). 
One the other hand, given Px|(x) an d P(x), we nave a PMF of X given by 

p x (X)=p (x) ((X))P xl{x) (X\(X)), 

which establishes that 

Css(H,T)<maxI({X);<X)). 

Px 

The proof is completed. ■ 
In the following, we will treat I((X); (Y)) as a function of p x for a given LOC. 
Definition 3: LOC(H,T) is uniform if there exists a function fi : Pj(F T ) x Pj(F T ) -)• [0 1] such that 

f /i((X),{Y}) (Y) C (X) 
Pr{Y = Xff} = I \ / - \ / 

I o.w., 

We can check that the three examples of LOCs in EIHI-DI are all uniform. Css (H, T) gives a lower bound of 
C(H,T). Moreover, this lower bound is tight for uniform LOCs. 

Proposition 4: For a LOC, I(X; Y) > I((X)-, (Y)) and the equality is achieved by uniform LOCs. 

Proof: See WT-B ■ 
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B. Subspace Coding 

Since a subspace degradation of a LOC takes subspaces as input and output, the coding for this channel is called 
subspace coding, which was first used by Kotter and Kschischang for random linear network coding ll26ll . We give 
a generalized definition of subspace coding as follows. 

Let M* = min{T, M} and recall that Pj(Af*,F T ) is the set of subspaces of F T with dimension less than or 
equal to M*. The nth Cartesian power of Pj(M*,F T ) is Pj"(A/*,F T ). An n-block subspace code is a subset 
of Pj n (Af*.F T ). Recall that the Grassmannian Gr(r,F T ) is the set of all 7--dimensional subspaces of F T . An r- 
dimensional (constant-dimensional) subspace code is a subset of Gr"(r, F T ), the nth Cartesian power of Gr(r, F T ). 

For LOC(H,T), we can choose a transition probability Px\{x) an d a PPly a subspace code to its subspace 
degradation with Pxux)- I n other word, we transmit U € Pj(M*,F T ) through the LOC by randomly choosing 
a matrix X according to the transition probability Px\(x) (X|?7). The decoding of a subspace code only consider 
the subspace spanned by the channel output. So, for two reception Y and Y' with (Y) = (Y'), a subspace code 
decoder treats them as the same. The maximum achievable rate of subspace coding for LOC(ff, T) is given by 
Css(H,T). 



C. A Decomposition of Mutual Information 

Theorem 3: For a LOC there exists an a-type input that maximizes I((X); (Y)). 

Proof: This proposition can be proved similar to Theorem |2] by applying Lemma |2] ■ 
By Theorem [3] we know 

C SS (H,T)= max I((X);{Y}). 

px'-a-type 

That is, we only need to consider a-type inputs to find Css(H,T). 

For a random matrix X, recall that rk(_X~) is the RV representing the rank of X (see © for the PMF). Similar 
to Lemma[3] for a LOC J(rk(X);rk(Y)) is determined by px and Py\x- Define 

T 

J(rk(JC);rk(Y)) - J> rkW rk( y) (r, s) log 2 ^ 



E 



i0 S2 A{ x) 

Ark(y) 



(15) 



where PA(x)A(Y)( r i s ) can t> e derived using px and Py\x- 
Lemma 4: For a LOC with a-type inputs, 

I((X); (Y)) = /(rk(X);rk(Y)) + J(rk(X); rk(Y)). (16) 

Proof: The proof is done by rewriting the formulation of mutual information using the symmetric property 
and the definition of a-type inputs. See $ VI-EI for details. ■ 
In ( [ToT l. J(rk(X);rk(Y)) is the mutual information of the ranks of transmitted and received matrices. In other 
words, it is the rate transmitted using the matrix ranks. The meaning of J(rk(JC);rk(Y)) has an interpretation using 
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set packing. The capacity contributed by r-dimensional transmissions and s-dimension receptions is log 2 ^ = 
1°S2 (J) q /(s)q' where \s) q ^ s tne total number of s-dimensional subspaces in F T , and ( r s ) q is the total number of 
s-dimensional subspaces in an r-dimensional subspace. Treat an s-dimensional subspace in F T as a set element. 
An r-dimension transmission can be regarded as a collection of s dimensional subspaces that span it. Then, the 
maximum set packing problem is looking for the maximum number of pairwise disjoint collections of s-dimensional 
subspaces that has cardinality (r I ) q and spans an M-dimensional subspace. 

D. Lower Bound of the Maximum Achievable Rate 

Using Lemma [4] we derive two lower bounds of the maximum achievable rates of subspace coding that only 
depend on the rank distribution. 

Theorem 4: For LOC(H,T) with dimension M X N and T > M, 

T 

±{H) 



C SS (H,T)>E 



T 



lQ g2-M 

A-rk(iT) 



l{T\0g 2 q) 



= (l-M/T)E[rk(H)] + e(T,q), (17) 

where 

e (T,g)Tlog 2 g = 5> rk(H) ( S )log 2 ^ < 1.8. 

This lower bound is achieved by the a-type input px with p±(x)(M) = !• 

Proof: See ^VTEl ■ 
Remark: Note that this bound depends on the rank distribution of the transformation matrix. This lower bound is 
tight for certain LOCs with sufficiently large T (see Theorem [5]). 

The RHS of (fTTT i implies that subspace coding can achieve higher rate than channel training. As a quick summary, 
we see 

(1 - M/T) E[rk(/f )] + e(T,q) < C SS (H,T) < C(H,T) < E[rk(ff)]. (18) 

This lower bound is better than the one in Cor. [T] Furthermore, 

C{H, T) - C SS {H, T) < E[rk(iT)] - (1 - M/T) E[ik(H)} - e(T, q) 
= M/TE[rk(H)]-e(T,q). 

The gap between the lower bound of Css(H,T) and Cqt(H,T) is quit small, which is demonstrated in Fig. [4] 
Prop. 2J however, is trivial for T < M. We can use the similar method in extended channel training to obtain a 
better lower bound. We can foresee that the improved lower bound of Css(H,T) is close to Cect(H,T). We will 
not repeat the procedure here. 
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Fig. 4. Here we fix an H with M = 5 over binary field. We plot the lower bounds for T from 5 to 1000. 



E. Proofs 

Proof of Prop. Let U = Pj(F T ). We have 



£ £ p(X,Y)log a - 



P(X,Y) 



p x (XW(Y) 

^ P(x)(Y)(U, V)log. 



v.ueu P(x)(U) P{Y) (V) 
= I((X);(Y)), 

where (a) follows from the log-sum inequality. To prove this proposition, we only need to show the equality in 
(a) holds for uniform LOCs. We need to check that Py|x(Y|X)/py(Y) is a constant for all X and Y with 
(Y) = V < (X) = U < F T . Fix an input distribution px- Since the LOC is uniform, 

PY (Y)= Py\x(Y\X) Px (X) 

X:V<(X) 

U'<F T :V<U' X:(X)=C/' 
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= ]T n(u',v) P{x) (u'). 

U'<¥ T :V<U' 

Thus, 

JW(Y|X) //([/,!/) 



KU',v) P{x) (u'Y 

This verifies the equality in (a) holding. ■ 

Proof of Lemma |?} Fix an a-type input p x . For V < U < ¥ T with dim(J7) = r and dim(T^) = s, we first 
show 

P{x){Y){U, V) = . (19) 

\r Iq \slq 

We only need to show that p {x){Y) (U,V) = P {X ){Y) (U\ V) for any V < U < ¥ T and V < U' < ¥ T with 
dim(t/) = dim(t/') and dim(V r ) = dim(V'), because if this is true, 

P*(x)*(Y)(r,s) = P{X)(Y)(U*,V*) 

diro(C/*)=r,diro(V*)=s,V*cC/* 

= P{xhy)(U,V) J2 1 

dim([/*)=r-,dim(V*)=s,V*CE/* 

■ T 

= V{x)(Y){U,V) 
Let 

A(m, U) = {Xe ¥ txm : (X) = U}. 
By Lemma [H we can find $ G Fr(F TxT ) such that $U = U' and $y = V. Then, 

P{x){Y) (u,v) = Yl ^( X ) E ^i^( Y i x ) 

X.eA{M,U) YeA(N,V) 

( => ^ px(*X) *V|x(*Y|$X) 

xeA(M,c/) YeA(Af,y) 

XeA(M,*(7) Ye.4(JV,$V) 

= p<x)<y)($£/,$F) 
= P W <y)(tf',V')- 

(b) follows that p x is a-type (px(X) = p x ($X)) and Py| X ($Y|$X) = P Y \ x (Y\X) follows from Cor. (c) 
follows from A(m, $£/) = &A(m, U) (see Lemma [20b. This proves (1191 . 
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Applying the property of a-type input, 

P{X)(U)= E p^( x ) 

XeA(M,U) 

= e ^( $x ) 

XeA(M,U) 

= E ^(X) 

Xe<S>A{M,U) 

c => E 

XeA(M,£/') 

-P(X)(C/') 

where (d) follows from Lemma [2Qb. Therefore, 



P(X>(^) = — 7tT — ■ (20) 

\r )q 



Moreover, 



P(Y)(V)= E P(X)(Y)(U,V) 



u-.vcu 

= E E P<x) { Y } (U,V) 

r>s U:VCU,dim(U)=r 

(T) (r) 2^ 

r>s W >1 ysJ 1 U:VClJ,dim(U)=r 

(e) ^ P±(X) rk(Y) { r i s ) ( T - .s' 

r > s , Vr !q \s!q 

ffl V"^ Prk(X) rk(Y) 0, s) / T \ X T s 



E 



\r )q \slq \ T I As 



r>s w/gvs/q 

_ ftk(y)(s) 

where (e) and (f) follow from Lemma fT4l Substituting (QjjJ, d2Qj and (EB into /((X); (F)), we have 



(21) 



E E ^.^^ 

dim(V)=s 



dim(V)=s 
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^2Prk(X)rk(Y)(s,r) k)g 2 



Pik(X)MY)(r,s) (J)c 



s<r 

T 

I(Tk(X);rk(Y))+Y / P±(x)A(Y)(r,s)log 2 ^. 



Prk(X)(r)PA(Y){s) (s) q 

xl 

s<r 

This completes the proof. ■ 
Proof of Theorem^} Substituting the a-type input with p r k(x)(M) — 1 m Lemma|4] we have J(rk(X); ik(Y)) = 
and J(rk(X);rk(y)) = J2 S p A(Y)\rk(x){s\M) log 2 Given X 6 F TxM with dimension M, 

P MY)lx (s\X) = Pr{rk(Xff) = s} = Pr{rk(tf) = a}. 

Thus, P r k(Y)\rk(x){ s \M) = Pr{rk(i?) = s}. Using the definition in (01, we can write 



!og 2 tit = log 



~> 2 V M °2 [M n Ma 

As Ss y 

= (T-Af) S log 2 g + log 2 ^| 

Since Cj < 1, 

cF <log2 c 

where the last inequality follows from Lemma [15] So 



log 2 ^ < log 2 < 1.8, (22) 



J(rk(X);rk(y)) = £> k(if) ( S )(T - M) S log 2 g+ 

s 

X]ftk(H)(s)l0g 2 ^ 
s Ss 

- (T - M) log 2 9 E[rk(ff )] + e(T, g )T log 2 q, 

where e(T,q) = £ s ftk(ff)(s) log 2 ^/(Tlog 2 q) < 1.8/(Tlog 2 ?). The proof is complete by C SS {H,T) > 
J(rk(X);rk(y)). ■ 

VII. Optimal Inputs for Subspace Coding 

In this section, we show that using constant-dimensional subspace coding is almost as good as the general 
(multi-dimensional) subspace coding. 

A. A Formulation of a-type Inputs 

Lemma 5: A function p : F TxM — > M is an a-type PMF if and only if it can be written as 

p(X) = i?(rk(X)) (23) 

Xrk(X) 

where Q r (-) is a PMF over Gr(r,¥ M ) and be a PMF over {0, 1, • • ■ , M}. 

Proof: If p can be written as ( |23l , by Lemma [T] p is an a-type PMF. On the other hand, if p is an a-type 
PMF, it can be written as ( fT2l . Let 

R(r) = E 

E/:rk(!7)=r 
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o.iD. 

i(C/)^ 



For r such that R(r) > 0, let 

(rj= ' Q(f : )//?(ri «ihu ( n = , 

For r such that i?(r) = 0, let Q r (-) be any PMF over Gr(r,F M )- Since Q dim{0) (tj)R(dim(U)) = Q(U), we see 
that p can be written as (1231 1. ■ 
When using the formulation in d23l . I(rk(X); rk(F)) and J(rk(X); rk(F)) can be written as functions of Q r (U) 
and R(r) as follows. Using the property of Markov chain, 

P±(Y)\vk(X)(s\r) 

H p ±(Y)\(xT)(s\U)P(xT)\ rk (x)(U\r) 

U£Gr{r,¥ M ) 

= E P±(Y)\(XT){s\U)Q r {U), (24) 

t/GGr(r\F M ) 

in which Pa(y)\(x t ){ s W), given in Coro.[3] is a function of pn and is not related to Q r {U) and R{r). Thus, we 
can write 

/(rk(X);rk(y)) = ^E(r)^P(a[r)log 2 (25) 
in which P(s|r) = -Prkfy) i-k(x)( s k) is given in 024b : and 

J(rk(X);rk(y)) = ^ J R(r) £ Q r (U)g(U), 

r U£Gr{r,¥ M ) 

in which 

s Xs 

Note that g(C/) only depends on the distribution of H, but does not depend on the input. Define 

rk*(i?) = max{r : Pr{rk(iJ) = r} > 0}, 

i.e., the maximum nonzero rank of the transformation matrix. 

Lemma 6: Consider LOC(H, T) with dimension M x N and T > M. Fix an a-type input. For V < ¥ M with 
dim(F) = r < ik*(H), 

g(¥ M )-g(V)>e(T,r,H) log 2 q, 

where 

9(T, r, H) = {T- M)(vk*(H) - r)p rk(if) (rk* (ff)) 
-r(M-r)+lo Sq C- 

Proof: See flVlFEl ■ 
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B. Optimal Inputs for Subspace Coding 

We treat Q, (X) and R(r) as the variables to maximize I({X); (Y)). By the KKT conditions, a set of necessary 
and sufficient conditions such that an a-type input with variables Q r (X) and R(r) to achieve Css(H,T) is that 

oQ r (U) 

Vr, *7 G Gr(r, F M ) : Q r (f7) > 0, (27a) 

dQr{U) 

Vr, C/ G Gr(r, F M ) : Q r (£/) = 0, (27b) 
a/(rk(X);rk(r)) ^ - - 

C/GGr(r,F M ) 

Vr : i?(r) > 0, (27c) 

a/(rk(X);rk(r)) _ 

+ ^ Qr(U)g(U) < A 

(7GGr(r,F M ) 



Vr : R(r) = 0, (27d) 



where the partial derivatives are 

dJ(rk(X);rk(y)) 
dQ r {U) 



and 



We can check that 



and 



R(r) 2^ p MY)\{xt) (s\U) log 2 — log 2 e, 



d/(rk(X);rk(Y)) 
dR{r) 

= V-Prk(y)|rk(x)(s|r-)log 2 — log 2 e. 



C SS (H,T) = A + log 2 e, 



A = ^ A r + (M - 1) log 2 e. 

T" 

The above optimization problem to find an optimal input for subspace coding is in general hard. We have already 
shown that for large T, the M-dimensional a-type input gives a good approximation of the channel capacity (see 
Prop. @|. Here, we can further improve the result for a class of LOCs 

Definition 4: A random matrix H with dimension M x N is regular if p^u) ( s ) > for < s < M. LOC(H, T) 
is regular if H is regular. 
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Theorem 5: Consider regular LOC(if, T) with dimension M x N. There exists T\ such that when T > T\, Css 

T 

is achieved by the a-type input with R(M) = 1. In this case Css(H,T) — g(¥ AI ) — Y^ S PM H )( S ^ 1°&2 t- = 



E 



ln<r Xrk(ff) 
10 S2 pra 

*rk(H") 

/Voo/v See gVIFEl 



Assume M < N. Since ftk(_Hp„ E ) ( r ) = Xr 9 f° r r — M, H pme is regular. 

C. Optimal Constant-Rank Inputs 

An input for a LOC with p±(x){ r ) = 1 is called a constant-rank or rank-r input distribution. When talking about 
subspace coding, rank-r input is corresponding to r-dimensional subspace coding. Our discussion of constant-rank 
inputs for subspace coding is equivalent to the discussion of constant-dimensional subspace coding. 

Let 

C C -ss(ff,T)= max I((X);(Y)) 

px : constant -rank 

so that Ccss(H,T) is the maximum achievable rates of constant-dimensional subspace coding. Let (7 c . ss (iJ, T) 
be the normalization of Ccss{H,T) by Tlog 2 g. The rank of a constant-rank input that achieves Ccss(H,T) is 
called an optimal input rank. We show that to find an optimal input rank, we only need to consider a-type inputs. 
Moreover, we can determine Cc-ss (H, T) and an optimal input rank based on sufficient channel statistics such that 
we can calculate g(U). See Prop. [6] and Theorem [7] for details. 

Theorem 6: For any LOCs, there exists a constant-rank a-type input that achieves Ccss{H,T). 

Proof: The proof is similar to the proof of Proposition |2 See WII-EI ■ 

Theorem 7: For LOC (if, T) with dimension M x N, let 

U* = arg max g(U). 

U£Pj(M',¥ M ) 

Then, r* = dim([/*) is an optimal input rank and Ccss(H,T) = g(U*). Furthermore, 

T\og 2 q 

Proof: See ffVILEl ■ 
Theorem [7] also bounds the loss of rate when using constant-dimensional subspace coding. Assume M = N = 5, 
T = 10, q = 4 and E[rk(i?)] = M/4. We have 

C ss (H,T)~ Cc- ss T) log 2 M 



C SS (H, T) Tlog 2 q(l - M/T) E[rk(if)] 

= 9.8 



So the loss of rate is marginal for typical channel parameters. 
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D. Optimal Input Rank 

To evaluate the results in Theorems [6] and Theorem |7l we require the distribution of the transformation matrix. 
Now we show that in some cases, we can relax this requirement significantly. For LOC(H, T), recall that 

rk* (if) = max{r : Pr{rk(if) = r} > 0}. 

Theorem 8: For LOC(H, T), there exists T such that when T > T , r* > rk*(H), where r* is the optimal 
input rank given in Theorem [7] 

Proof: Suppose the dimension of the LOC is M x AT. Fix T such that @(T ,r,H) > for all r < rk*(H). 
This is possible because 9(T. r, H) is a linearly increase function of T for all r < rk*(H). Assume T > Tq and 
r* < fk*(H). For any V < F M with dim(V') < ik*{H) < M, by Lemma 

g(¥ M ) > g(V). 

Thus, we have a contradiction to r* < ik*(H). ■ 
Theorem [H] narrows down the range to search an optimal input rank for large T. When rk*(iJ) = M, it tells that 

M is an optimal input rank when T is sufficiently large. The proof of Theorem [8] tells how to find a Ti> 

As an example, let us check the optimal input rank of LOC(i7f u n, T). We know rk*(i/f u ii) = M and PMHm) (M) = 

1. By Theorem |8] there exists T such that when T > T , r* = M. Now we want to know the value of T . From 

the proof of Theorem [8] we know that T should satisfy 

e(T ,r,H m ) >0, Vr<M. 

In other words, T should satisfy 

(r - To/2) 2 - (To/2 - M) 2 + \og q g > 0, Vr < M. (28) 



If M < T < 2M - 1, 428) does not hold for r = M- 1. If T = 2M, the minimum value of the RHS of 
obtained for r = M — 1, i.e., 1 + log ? which is positive when q > 2. Similarly, we can check that To = 2M + 1 
is sufficient for any field size. As a conclusion, when i) q > 2 and T > 2M or ii) q = 2 and T > 2M + 1, the 
A/-dimensional a-type input is an optimal constant-rank input for (Hf u n,T). 

E. proofs 

Proof of Lemma® Let U = ¥ AI . Since V < U, we can find a full rank M x M matrix 

D 



D 



such that (D T ) = U and (D^) = V. By Coro.E] 

E p r k(y)KxT)(s'|y) = Pr{rk(Dxff) > s}, 



'>s 
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and 

Pmy)\(xt)(s\U) = Pr{rk(Dff) - s} 
= Pr{rk(if) = s}. 

We know Pr{rk(ff) > s} > Pr{rk(Di#) > s}. So 

E P MY)\{XT)(s'\U) > £ P rk( y) K XT)( S '|n 

s'>s s'>s 

Moreover, for s such that r < s < rk*(H), 



Thus, 



By definition, 



J2 P MY)\(xT)(s'\V)=0. 



'>s 



J2 s ( Pa (y)\{xt)(s\U) - P*(y)\(xt)(s\V)) 

S 



k s>k 

> E E )(s\U) 

k:rk* (H)>k>r s:s>k 

> E Pr{vk(H) = rk*(i2")} 

k:&»(H)>k>r 

= (vk*(H) - r) Pr{rk(iJ) = rk*(iJ)}. (29) 

9(U) - 9(V) 
log 2 <? 



= E P MY)i(x-)(MU) ((T - M) S + log, |£) 
-E P r k(y)|(XT)( S |^) (V - r)s + log q £ 

= (r - ^)E g ( p *mi^)W^) - V)i(jf T )W^)) 

s 

-(M-r)E^r k( y)|(XT)( S |T>) 

s 

+E p A(ni<^)(s|f/)iog 9 -f r 

-T, P MY)\(x^(MV)\og q ^ 

s ^ s 

> (T - M)(rk*(ff) - r) Pr{rk(ff) = rk*(iJ)} 
- r{M - r) + log q C, 
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where the last inequality follows from ( |29l l. Therefore 

(M - r) £ S P A{Y )\ { xT } ( S \V) < r(M - r), 

S 

C 7 S 



P MY)\(xr } (s\U) \og q ^ >0, 



and 



Y, P MY)\(xT)(s\V)\og q -^ < ^2 P A(Y)\(xr)(s\V)\og q — < log 9 — . 



Proof of Theorem\5\ To prove the theorem, we only need to check that the a-type input with R(M) = 1 

satisfies (1271) . Conditions (127 at and (I27bb with r < M are satisfied by A r = log 2 e because R(r) — 0. Since 
<3m(F m ) = 1, we check condition (I27ab with r = M. Since -P r k(r)|rk(x)( s |A0 = frk(r)(s)> 

97(rk(X);rk(F)) 



9Qm(F m ) 



log 2 e. 



fl(M) = l 



So, d27ab with r = M is satisfied by Am = <?(F M ) — log 2 e. This completes the verification of ( 127at and (I27bb . 

The above analysis also tells that A = Am- Now we check ( I27cl > and ( I27db with A = g(F M ) — log 2 e. Since 
R(M) — 1, condition ( I27cb should be satisfied with r = M. This is true since 

<9I(rk(X);rk(Y)) 



+ . 9 (F M ) = -log 2 e + <?(F M ) 



M \ 



R(M) = 1 



dR(M) 

Next, we check condition d27db for r < M. We know 

dI(rk(X);rk(Y)) 



dRM 



R(M) = 1 



V- D r i m P±{Y)\±(x){s\r) 

2^ PMY)\A(X) (s\r) log 2 — — - log 2 e. 

, Pa(Y)\A(X)\S\M) 



Since 



we have 



{A) 

P&(Y)\A(X)(s\M) = P rk(y) | (X T ) (s|F M ) 

= Pr{rk(DiJ) = s} 
= Pr{rk(#) = s}, 



(A) < J2 P MY)\A(x)( s \ r ) log 2 

S 

= J2 PA (Y)\±(x)(s\r) log 2 



P±(Y)\vk(X)(s\M) 

1 



Prk(iT) (s) 



< -log 2o min M p rk(ff) (s). 
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That is 

<9/(rkpO;rk(y)) 



dR{r) 



< - log 2 min p rk(H) (s) - log 2 e. 



R(M) = l 0<s<M 



Fix Ti such that G(Ti,r, H) > — log 2 mino< s <M P±(h) ( s ) f° r a U r < M. This is possible because Q(T,r,H) 
is linearly increase with T and — log 2 mino< s <M Ptk(s)(s) does not change with T. By Lemma [6] g(F M ) > 
g({7) - log 2 min < s <M Pa(h) (s) for all U G Gr(r,F M ). Thus 

A = .g(F M )-log 2 e 
> max ff(E/)-log 2 mm p A(H) (s) - log 2 e 

t/£Gr(r,F M ) 0<s<M 



0iZ(r) 

C/SGr(r-,F M ) 



H(M)=1 



Hence, condition ( I27db with r < M is satisfied. ■ 
Proof of Theorem [6} Consider a LOC with block length T. Let px (X) be an optimal constant-rank input with 
Prk(x)(r*) = 1. For $ G Fr(F TxT ), define p* as pf (X) = px($X). It is clear that pf k{x (r*) = 1. By Lemma 
p* (X) is also an optimal constant-rank input. Define p* x as 

P * x{X) = lFr(F^ni ^ P * (X) - 

1 V n *eFr(F T >< T ) 

By the concavity of the mutual information, we know p* x is also an optimal constant-rank input. We can check 
that p* x is a-type as in the proof of Proposition [2] ■ 
Proof of Theorem [7} For an r-dimensional a-type input, 

I((X);{Y))= Qr(u)g(u) 

< max g(U) 

C/eGr(r,F M ) 

< g{u*). 

Thus Cc-ss < 9(U*)- On the other hand, for the r* -dimensional a-type input with P( X T}(U*) = 1, Cc-ss > 
/«X>;<F» = «?(£/*). 

Furthermore, for an a-type input 

7((X); (Y)) - Cc^ss = /(rk(X);rk(Y)) + J(rkX;rkF) - g(U*) 

< J(rk(X);rk(Y)) 

< log 2 min{M,iV}. 

Thus, C S s - C c -SS — lXLciXp x - Q _ typ e I({X): (Y)) - Cc-ss < log 2 min{Af, N}. ■ 

VIII. Coding for Linear Operator Channels 
From this section, we consider coding design for LOC (H,T). 
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A. Using Channel Training or Subspace Coding 

We have considered two kinds of coding schemes for noncoherent LOCs: channel training and subspace coding. 
For channel training, all the input matrices X have the form 

I 

X 



X 



(30) 



where I is an identity matrix. For such a transmission, the received matrix 





I 




H 


Y = 




H = 






X 




XH 



where H is the instance of H. The receiver can use the first part of Y to recover H and use this information to 
decode X. We have shown that the normalized maximum achievable rate using channel training is 

Cci(H,T) = (1 - M/T) E[rk(if)]. 

For subspace coding, a codeword contains a sequence of subspaces and the transmission of a subspace through 
LOCs involves the transformation of a subspace to a matrix. The decoding also only depends on the subspace 
spanned by the received matrix. For more details, see our discuss in ^VI-BI We have shown that the normalized 
maximum achievable rate using subspace coding satisfies 

E[rk(#)] > C SS (H, T) > (1 - M/T) E[rk(ff)] + e(T, q), 

where < e(T, q) < 1.8/(Tlog 2 q). We have shown the lower bound of Css{H, T) is tight for regular LOCs when 
T is sufficiently large. Therefore 

e(T, q) < C SS (H, T) - C C t{H, T) < M/T E[rk(ff)]. 

So using channel training does not loss much in rates, especially when T is large. 

For encoding, a channel training code can be regarded as a special subspace code. But the decoding of channel 
training codes uses the received matrices, while the decoding of subspace codes uses the subspaces spanned by 
the received matrices. However, we can just decode a subspace code using the matrices received. If we apply this 
decoding method of subspace codes, channel training can be regarded as a special case of subspace coding. This 
is the reason that even some existing subspace coding schemes use channel training 

In this paper, we only study the design of channel training codes. 



B. Some Existing Channel Training Codes 

Existing coding schemes for RLCN also works for LOCs, even though a RLCN is a special LOC with its 
transformation matrix depends on the network topology. In fact, most coding practice of RLCN is based on channel 
training. We first introduce two coding schemes for RLCN. 



April 15, 2010 



DRAFT 



37 



The first coding scheme was introduced by Ho et al. [9 |. They assumed that the transformation matrix has rank 
M. In their scheme, a codeword has the form in d30b where any matrix in y?(.T—M)xM can ^ e use( j as ^ e ca jj 
such codes the classical channel training codes. A received matrix has the form 





I 




H 


Y = 




H = 






X 




XH 



Since H has rank M, the receiver can decode X by solving a system of linear equations. The rateless realization of 
random linear network codes found in 1 16 1, [17] applies a classical channel training code over LOC(G_ff, T), where 
LOC(_ff, T) is the original channel and G is an r x M purely random matrix. We will give a general discussion of 
this approach and show that we only need to consider r < M. 

Silva et al. [20 1 proposed a more general method in which X in ( f30b can only be chosen from a rank-metric 
code. The redundancy in the rank-metric code can be used to correct the rank deficiency of H as well as additive 
errors, which are not considered in this work. This code construction is nearly optimal in terms of a Singleton type 
coding bound on one -block subspace codes |[T9l . 

Both of the works (9), ll20l construct channel training codes with unit block, which in general cannot achieve 
the channel capacity of LOCs. Two more recent works 11271 . IT281 considers design of channel training codes with 
non-unit length. The authors proposed a multilevel code construction approach in ll27l . Parallel to our work, this 
approach is used explicitly to construct "multishot rank-metric codes" 11271 . Note that the multishot rank-metric 
codes constructed in ll27l is different to the codes we will proposed here, even though we both apply rank metric. 
For the lack of a performance evaluation of their codes, we cannot see whether their codes achieve (7 CT . 



C. Achieve Higher Rate than Cqt 

In the following, we will introduce two constructions of channel training codes for LOCs, called lifted rank- 
metric codes and lifted linear matrix codes, respectively. We will prove that lifted linear matrix codes can achieve 
Cct- But our codes can also used to achieve higher rate than Cqt using extended channel training. The approach 
is to use LOC(-ff, T) as LOC(GH, T) for any r x M random matrix G. As we discussed in £HVI we only need to 
consider r < M and Theorem [T] implies that a purely random G is good enough. 



D. Formulation of Channel Training Codes 

A matrix code C^ n ' C ^( T - M ) xnM induces a channel training code for LOC(iJ, T) with dimension M x N as 
follows. For X.W £ we write 



X(n) = 



Xi X2 



X rl 



where Xi £ ^(T-M)xM_ rj enne me M-lifting of X (n) , which extends the definition of lifting in J20), as 

LM(± (n} ) 



( 


Im 




Im 




Im 


) 








x 2 _ 


) * * " ) 


x n _ 
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where I M is an M x M identity matrix. We see L M Q£- {n) ) € (F TxM )". Define the M-lifting of as 

L M (C {n) ) = (I M (XW) : X ( "> e C (n) }. 

We call L M (C (n) ) the Z//fe<i mafn'x code of C (n) . When the context is clear, we write L(X (n) ) for L M (X (n:) ) and 

L(CW) for L M {C (n) ). The rate ftW of L(C^) is 

= log 2 |L(CW)| = log 2 |CW| _ 
rtT log 2 <? nT log 2 g 

Suppose that the transmitted codeword is L(X(™)). Each use of LOC(i?, T) can transmit one component of 
I(XW), The ith output matrix of LOC(H,T) is 



Y; 



where EL is the ith instance of H and Yj = XiH;. Let 



Im 


EL = 















H (n) 



Hi 



H n 



and 



Yi Y 2 ••• Y„ 



(31) 



Y(™) — 

We obtain the decoding equation of the lifted matrix code L{C^) as 

The decoding of Y^ 1 - 1 can use the knowledge of H^ n ^. 

IX. Rank-Metric Codes for LOCs 
In this section, we extend the rank-metric approach of Silva et al. to construct matrix codes for LOCs. 



(32) 



A. Rank-Metric Codes 

Define the rank distance between X, X' eF fxm as 

<f(X,X') =rk(X-X'). 

A rank metric code is a unit-length matrix code with the rank distance ED . The minimum distance of a rank-metric 
code C C F txm is 



When t > m, we have 



log 2 jgj 
t log 2 q 



< m-D{C) + \, 



(33) 
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which is called the Singleton bound for rank-metric codes [21 1 (see also [20 1 and the reference therein). Codes that 
achieve this bound are called maximum-rank-distance (MRD) codes. Gabidulin described a class of MRD codes for 
t > m, which are analogs of generalized Reed-Solomon codes GT1 . 

Suppose the transmitted codeword is Xo € C and the received matrix is Y = XoH. If H is known at the 
receiver, we can decode Y using the minimum distance decoder defined as 

X = arg min d(Y, XH) . (34) 

XfzC 

Proposition 5: The minimum distance decoder is guaranteed to return X = Xo for all H with rk(H) > r if and 
only if D(C) > m — r + 1, where < r < m. 

Remark: Silva et al. only proved the sufficient condition in Prop. [5] when considering additive errors. In fact, the 
necessary condition also holds without considering the additive errors as |fl9l , 11201 . 

Proof: We first prove the sufficient condition. Assume D(C) > m — r + 1 and rk(H) > r. We know 
d(Y,XoH) = 0. Suppose that there exists a different codeword Xi e C with d(Y, XiH) = 0. We have 
(Xo - Xi)H = 0. Using the rank-nullity theorem of linear algebra, <2(X ,Xi) = rk(X - Xi) < M — rk(H) < 
m — r, i.e., a contradiction to D(C) > m — r + 1. 

Now we prove the necessary condition. Assume D(C) < m — r. There must exist Xi,X2 6 C such that 
d(Xi,X 2 ) =rk(Xi -X 2 ) <m-r. Let 

B = {he F mxl : (Xi - X 2 )h = 0}. 

We know dim(B) = m — rk(Xi — X 2 ) > r. By juxtaposing the vectors in B, we can obtain a matrix H with 
rk(H) > r. We know (Xi — X2)H = 0. So if the transformation matrix is H, the decoder cannot always output 
the correct codeword. ■ 



B. Lifted Rank-Metric Codes 

Consider LOC(H,T) with dimension M x N. The lifted matrix codes L{C^), where C (n) 6 F( T - M ) xnM is a 
rank-metric code, is also called lifted rank-metric code. The unit-block (one-shot) lifted rank-metric code (n = 1) 
is first used by Silva et al. in random network coding ll20l . Here we extend their approach to multiple usages of 
the channel. 

By the Singleton bound of rank-metric codes in 

log 2 |CW| 



Thus the rate of Lm(C 



(n)i 



(T - M) log 2 q 



<nM - D{C (n) ) + l. 



log 2 |CW| 



nT log 2 q 

W (nM - D(C(")) + 1)(T - M) log 2 q 
nT log 2 q 

= (1- M/T)(M - D(C {n) )/n+l/n), (35) 
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where the equality in (a) is achieved by MRD codes. 

Suppose that the transmitted codeword is L(Xq™' ) ). By the decoding equality in d32i i. we can decode Y( n ) using 
the minimum distance decoder defined in (l34l >. By Prop. [5] the minimum distance decoder is guaranteed to return 

X(„) = jj-(n) for a jj jj(„) wkh rk ( H (n)) > nM _ £>( C (n)) + I 

C. Throughput of Lifted Rank-Metric Codes 
Let 

H {n) 

in which Hi, i = 1, • • • , n, are independent and follow the same distribution of H. By our analysis above, a receiver 
using the minimum distance decoder can judge if its decoding is guaranteed to be correct by checking rk(H(™)), 
which is an instance of H^ n \ If rk(H^™^) > nM — D{C^) + 1, the decoding is guaranteed to be correction. 
Otherwise, if rk(H^™- ) ) < nM — D(C^) + 1, correct decoding cannot be guaranteed. Define the throughput of 
L(CW) as 

TP RM (C M ) = K {n) Pr{rk(#(")) > nM - D{C {n) ) + 1}, 

where RM stands for rank metric. Note that this is the zero-error maximum achievable rate of lifted rank-metric 
codes. For any rate higher than TP R m(C^), we cannot guarantee error-free decoding. 

Since lifted rank-metric codes are special channel training coding method, we have TPsm{C^) < Cct(H,T). 
Now we look at whether lifted rank-metric codes achieve Cct(H,T). 

Theorem 9: For any positive integer n, 

max TP RM (C {n) )<P (n) C C T(H,T), (37) 

C(«) C F(T-Af)X«M 

where pW> < 1 and the equality in ([37j holds if there exist MRD codes CW c w( T ~ M l xnM with D(C^) = 
nM — r + 1 for r = 1, 2, • • • , nN*. Moreover, i) p( n ' = 1 if and only if H has a constant rank; ii) lim n ^oo p^ n ' = 1. 
Proof: Let N* = min{M, N}, the maximum possible rank of H. Let 6>(C (n) ) = nM - D(C {n) ) + 1. By (05), 

TP RM (C {n) ) < (l- y) ^^Pr{rk(^™)) > 0(C<*>)} . 

where the equality holds for MRD codes. Thus 

max C („) cF( T-Af)xM TPrm(C(")) max r <„7v* max c( „) cF (t-m) x »m . e ( C („) j =r T M dd (C'™' ) 
C C j(H,T) = (1 - M/T) E[rk(H)] 

W m&x r < nN , rPi-{ik(H^) > r} 
- nE[ik(H)} 



Ho 



(36) 
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where the equality in (b) holds if there exist MRD codes C F( T - M ) X " M with D(C^) = nM -r+1 for 
r = 1,2,--- ,nN*. 

Now we look at the property of p( n \ For any < r < nN*, we have 

E[rk(if(«))] = ^ Sftk(ff( „ )) ( s ) 

> 2^ SP rk(ff("))( s ) 

(d) v- 

> 2^ r ^k(^ ( " ) ) ( - S - ) 

s>r 

= rPr{rk(H (n) ) > r}. 

Thus, < 1. Now we check the condition that p( n > = 1. First, if p r k(jr)(ro) = 1 for some < r < M, then 
= 1. Second, if E[rk(i?(™))] = r n Pr{rk(#(™)) > r„} for some < r n < nN*, then the equalities in (c) and 
(d) hold, which give Pr{rk(F") = r„} = 1. Hence, Pr{rk(#) = r n /n} = 1. 

Let u = E[rk(iJ)]. By the weak law of large numbers, for any S > and e > there exists no such that when 

n > n 

Pr{|rk(# (n) )/n-^| <S/2}> 1 - e. 

Hence, 

Pr{rk(ff (n) )/n > p, - 6/2} > 1 - e. 

Further, for the same S when n > 2/5, there exists integer tq between n(/x — <5) and n(p — 5/2). So, when 

n > maxjrio, 2/<5}, 

(n) > r Pr{rk(g("))>r } 
n/i 

> n(|U - <5) Pr{rk(i/(")) > rc(ju - 5/2)} 

n/x 

> (p-6)(l-e) 

M 

>1- («//* + e). 

Therefore, lirrin^oo p( n '> =1. ■ 
We know that when T — M > nM, for any < r < nN* MRD code C (n) with D(C^) = nM - r + 1 can be 
constructed using Gabidulin codes OTI . If we use Gabidulin codes the equality in ( f37l > holds when n < T/M — 1. 
Let us see two cases: i) H has a constant rank. Now p' 1 ' = 1. Thus when T > 2M, lifted Gabidulin codes can 
achieve Cct- ii) H has a random rank we require a sufficiently large no to guarantee that pt™ ) is close to 1. If 
T > (no + 1)M, lifted Gabidulin codes can approach Cct- The unknown part is T < (no + 1)M, for which we 
do not know if lifted rank-metric codes achieve Cct- 
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TABLE I 

The values p m i n (c, 6) 



c 


1 


2 


3 


4 


5 


6 


Pmm(c,6) 


0.408 


0.408 


0.460 


0.526 


0.649 


1.0 



D. Insufficiency of Unit-block Lifted Rank-Metric Codes 

In general, we need n > 1 to have p^> close to 1. We will not study problems like that how large n is sufficient 
to have 1 — p^ < e in this paper. But we want to see whether n = 1 is good enough because of its low 
encoding/decoding complexity. As we show in the follows, however, unit-length lifted rank-metric codes cannot 
achieve Cqt(H,T) in general and the gap between the maximum achievable rate of unit-block lifted rank-metric 
codes to Cct(H, T) can be large. Our evaluation reflects the performance of such codes for random linear network 
coding. 

Recall that N* = min{M, N}. For < c < 1 and N* > define 

Pmin(c,iV*) = min 

p MH y.E[tk(H)}=c,t>L(H)<N* 

Considering T > 2M, there exists a rank distribution of H such that 

max TP RM {C) = p min (c, N*)C CT {H, T). 

CcF (T-JV/)xAf 

Linear programming algorithms can be applied to find p m i n (c, N*). In Table U we show the values p m i n (c, 6) for 
c = 1, • • ■ , 6. We see p m i n (Q, 6) = 1, which is the case that the channel has a constant rank. For c < 6, p m i n {c, 6) 
is less than 0.65. In Fig. [5] we show that the value of p m i n (3, iV*) decreases with N*. p m j n (3,200) is even less 
than one-fifth, which means that unit-block lifted rank-metric codes can achieve less than one-fifth of Cct(H,T). 

E. Complexity of Lifted Rank-Metric Codes 

If we apply Gabidulin codes, a family of MRD codes, the encoding requires 0((T — M)n 2 sM) operations in F. 
For decoding, we can apply the algorithm in [20], the complexity of decoding algorithm is given by 0(D(C t - n ' > )(T — 
M)n 2 s 2 ) operations in F. (Here we consider that one field operation in GF(q m ) require 0(m 2 ) field operations 
in F.) 

X. Linear Matrix Codes for LOCs 

In general, we require T » M to achieve Cqt{H, T) using lifted rank-metric codes. In this section, we propose 
another coding scheme that can achieve Cct for all T > M. 
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Fig. 5. The value of p min (3, TV*) for N* = 3, 4, • ■ ■ , 200. 



A. Linear Matrix Codes 

Consider LOC(iJ, T) with dimension M x N. For any positive real number s < N, let G*™-' be an [ns\ x nM 
matrix, called the generator matrix. The matrix code generated by G^™) is 

Q { t1m = {BG (n) : B S W^- M >-^}. 

The code for LOC(iJ, T) is the lifted matrix code L{Q^ M ), called lifted linear matrix code. The rate of L{Q^) 
is 

n{n) = log 2 |F^-^)x^J| 
nTlog 2 q 

= (1 -M/T)\ns\/n. 

When n -> oo, -> (1 - M/T)s. 

Suppose that the transmitted codeword is i(BoG^™^). The received matrix is given by OTT ). The decoding 
equation in (f32l > now becomes 

Y (n) = BoG("'hW. (38) 

Since the receiver knows H> n J and G'"\ the information B can be uniquely determined if and only if GWH'"' 
is full rank. Thus, the decoding error Pj™^ using (l38l satisfies 

P e (n) < Pr{ik(G< n) .H"< n >) < LnsJ}. 
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We can prove the following result in the next subsection. 

Theorem 10: Consider linear matrix codes for LOC(_ff, T) with dimension M x N, and (s,e) satisfying < 
s < s + e < E[rk(i?)]. More than half of the matrices G^ 1 ' € FL" S J xnM ^ when used as the generator matrix, give 
that 

pin) < Pr { rk (G(")i/(")) < \ ns\} < 2 f i- — - + g(s + e)"J 



where g(s + e) < 1 is defined in 

Thus for any R < E[rk(if)], there exists a sequence of lifted linear matrix codes with rate — > i? and 

Pe — ► as n — > oo. Moreover, P e decreases exponentially with the increasing of n. So lifted linear matrix 
codes can achieve the rate (1 - T/M) E[rk(F)]. 

B. Performance of Linear Matrix Codes 

Lemma 7 (Chernoff Bound): Let n, i = 1, • • • , n, are independent random variables with the same distribution 
of t e {0, 1, • • • , m}. For a < E[r], 



where 



3 (a)=E[(A/ J B)( T - Q )/ m ]<l, (39) 
S = 2j(r- a)pr(r). 



Proof: For any t > 0, 



(a) 



< e tnQ E[e- tE ^ r '] 
^ e tna l[E[e- tTi 



= (e* Q E[e-* r ])", (40) 

where (a) follows from Markov's inequality and (b) follows from independence. 

Now assume a < E[r]. Let f(t) = e ta E[e~ tT ]. We know that f(t) is a continuous function for t > and 
/(0) = 1. The first and the second derivatives of f(t) are 

f(t) = ^(a - r)e*< Q - r ^ T (r), and 

r 
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respectively. We see that /'(O) = a - E[r] < and f"{t) > 0. Thus, there exists t > such that /'(to) = and 
f'(t) < for < t < to. We give a bound on to in the following. Let 

A(t) = ^2(a- r)e t{a - r) p T (r), and 

r<Ca 

B{t) = ^{r ~a)e t{a ' r) Vr{r). 

We see that A{t) and B(t) are monotonically increasing and decreasing, respectively. Since f'(t) = A(t) — B(t), 
we have A(t ) = B(t Q ) and A(0) < 5(0). Observe that 

A{t) < A(0)e ta , 
B(t) > B(0)e-^ M - a l 

Let t\ such that 

A(0)e tia = B(Q)er tl( > M -^ (41) 

We know that < t x < t . Thus, f(t Q ) < f(ti) < 1. 
By dUl, 



Pr j^Vj < na j < min f n (t) 



= P(to) 
< /"Ca- 
using gTj we have e* 1 = (B(0)/A(0)) 1/M . The proof is completed by letting g(a) = f{t x ). 
Remark: An alternative to the Chernoff bound is Hoeffding's inequality, which gives 

2 ' 



t, V- f 2(a-£;[T]) 2 l 



But in our simulation, the error exponent obtained by the Chernoff bound is better than the one obtained using 
Heoffding's inequality. 

Lemma 8: Suppose that G^ n > is an \ns\ x nM purely random matrix and independent with ff'™', For any s 
and e such that 0<s<s + e< E[rk(H)], 

n — L nc J 

Pr{rk(G ( ")/f( n) ) < \ns\} < + g(s + e) n . 

9-1 



where g(s + e) < 1 is defined in 

Proof: Let = G^H^ and let 



a„(i) = Pr{rk(F (n) ) = [ns\ \ vk{H = i} . 
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Let Fi be the ith row of F^ n K Since contains uniformly independent components, Fi, i — 1, • • • , [ns\, are 
independent and uniformly distributed in the vector space spanned by the row vectors of H^ n >. For i > \n(s + e)\, 

a n(i) = C[r> S j 

= ri a-*-*) 

k—i— \_ns\ +1 
oo 

> n (!-^ fe ) 

fc__L n ( s +e)J-L ns J+ 1 
oo 

> n (!-^ fe ) 

fc__|_ ne J+ 1 
oo 

>i- E <r fc 

fc = [nej+l 
= l-g-L«J/( g _l) ) 

where (a) follows from Lemma [TTJ Moreover, using the Chernoff bound in Lemma [7] 

Pr{rk(i7 (n) ) < [n(s + e)J } < Pr{rk(ff (n) ) < n(s + e)} 

<(g(s + e)) n , 
where g(-) is defined in ( |39l > and <?(s + e) < 1. Therefore, 

Pr{rk(F (n) ) = [ns\} 

> /2 a n (i)PTk{H ("))(*)' 

i>|n(s+e)J 

> (l - li!!^ p r { r k(i/(")) > [n(s + ej} 



9-1 

■•- I 1 

>1-*— r -g(s + eY 
3-1 



The proof is completed. ■ 
Lemma 9: Let < 6; < 1, i = 1, • • • , n, be a sequence of real numbers. If X)"=o — ^ or some e > 0, 
then there are more than half of the numbers in the sequence with values at most e. 
Proof: Let A = {&; : &» < e}. If \A\ < n/2, then 

n 

I> = !>+!> 

i=0 

> e(n - |A|) 

> rae/2. 

We have a contradiction to X)"=o^/ n — e /^- Thus, > n/2. ■ 
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Proof of Theorem 170} There are totally q n V ns \ AI \ns\ x nM matrices. The average probability of error, when 
using these matrices uniformly, is upper bounded by 

Pr{rk(G (n) # (n) ) < [ns\}q- n ^ M 

G(») £ Fl»«Jx»M 

Pr{rk(G^ff(">) < [ns\}p G{n) (G^) 
= Pr{rk(G ( " ) iJ ( ™ ) ) < [ns\} 

n- L™ C J 

<^- r + S ( S + ef ) 

where is a purely random matrix and the last inequality follows from Lemma [8] Thus by Lemma [9] half of 
these matrices give a probability lower than 2( g + g(s + e)"). ■ 

C. Complexity of Lifted Linear Matrix Codes 

In practice, we can use a pseudorandom generator to generate matrix G'-™*' , called pseudorandom generator matrix, 
and share the pseudorandom generator in both the transmitter and the receiver. Discussion of the pseudorandom 
generator design is out of the scope of this paper. The encoding complexity using a pseudorandom generator matrix 
is 0((T — M)Msn 2 ) and the decoding based on Gaussian elimination requires 0(n 3 s 3 + (T — M)n 2 s 2 ) operations 
in F. 

Compared with the lifted Gabidulin Codes, the complexity of decoding a lifted linear Matrix code using Gaussian 
elimination is higher. To reduce the complexity of encoding and decoding is an important future work to make 
lifted linear matrix codes practical. 

D. Rateless Coding 

Our coding schemes, both the lifted rank-metric codes 1 and the lifted linear matrix codes, require only E[rk(i/)]. 
Here we show that the lifted linear matrix codes can be realized ratelessly without the knowledge of E[rk(7?)] if 
there exists one-bit feedback from the receiver to the transmitter. 

Suppose that we have a sequence of R x M matrices G^, i = 1, 2, • • • , called the series of the generator matrices 
of rateless lifted linear matrix codes, which is known by both the transmitter and the receiver. Here R is a design 
parameter. Write 



Q (n) 



Gi G2 



The transmitter forms its messages into a (T — M) x R message matrix B, and it keeps on transmitting i(BGi), 
i = 1, 2, • • • , until it receives a feedback from the receiver. The ith output of the channel is given in (fJTJ. After 
collecting the nth output, the receiver checks that if GMHM has rank R. If G^H^ has rank R, the receiver 
sends a feedback to the transmitter and decodes the message matrix B by solving the equation Y" = BG^'H'"'. 
After received the feedback, the transmitter can transmit another message matrix. 
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Applying Theorem [10] we can evaluate the performance of the rateless code. The rateless lifted linear matrix 
codes can achieve the rate (1 - M/T) E[rk(if)]. 

Corollary 5: Consider rateless linear matrix codes for LOC(_ff, T) with dimension M x N. There exists a series 
of generator matrices of rateless lifted linear matrix code G ¥ RxM ,i = 1,2,..., such that the transmission of 
one message matrix can be successful decoded with probability at least 



after n > R/E[rk(H)] transmission, where < e < E[rk(if) - R/n and g(R/n + e) < 1 is defined in d39b . 



Linear operator channel is a general channel model that including linear network coding as well as the classical 
Z-channel as special cases. We studied LOCs with general distributions of transformation matrices. 

This work showed that the expectation of the rank of the transformation matrix E[rk(iJ)] is an important parameter 
of LOC(iJ, T). Essentially, this is the best rate that noncoherent transmission can asymptotically achieve when T 
goes to infinity. We show that both subspace coding and channel training can achieve at least (1 — M/T) E[rk(_ff)]. 

This work studied subspace coding from an information theoretic point of view. Compared with general subspace 
coding, constant-dimensional subspace coding can achieve almost the same rate. Given a LOC, we determined the 
maximum achievable rate of using constant-dimensional subspace coding, as well as the optimal dimension. 

We determined the maximum achievable rate of using channel training. The advantage of subspace coding over 
channel training in terms of rates is not significant for typical channel parameters. So considering channel training 
for LOCs is sufficient for most scenarios. We proposed two coding approaches for LOCs based on channel training 
and evaluate their performance. 

Many problems about LOCs need further investigation. For small T (e.g., T < M), we are still lack of good 
bounds and coding schemes. It is possible to extend this work to LOCs with additive errors and multi-user 
communication scenarios. Moreover, efficient encoding and decoding algorithms for the coding approaches we 
proposed are required for practical applications. 



Parts of the counting problems here can be found in various sources, e.g., ll29l . ll30l and reference therein. Here 
we give the self-contained proofs. 

Lemma 10: When < r < m, | Fr(F mxr )| = x™ 

Proof: The lemma is trivial for r = 0, so we consider r > 0. We can count the number of full rank matrices 
in F mxr by the columns. For the first column, we can choose all vectors in F m except the zero vector. Thus we 
have q rn — 1 choices. Fixed the first column, say v\, we want to choose the second column i>2 in F m but is linear 
independent with v\. Hence, we have q m — q choices of V2- Repeat this process, we can obtain that the number of 
full rank m x r matrices is (q m — l)(q m — <?)••• (q m — q r ~ 1 ) = X™- ■ 




XL Concluding Remarks 



Appendix A 



Counting 
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Recall 

f (1 - T m )(l - •■•(!- r > 

[ 1 r = 

for r < to. 

Lemma 11: Let G be an s x m random matrix with uniformly independent components over F. Then for r < to, 

Prk(GH)\A{H)(s\r) = 

where H is any m x n random matrix. 

Proof: Fix an to x n matrix H with rk(H) = r. Let F = GH and let and fa be the ith row of G and F, 
respectively. Since <?; contains uniformly independent components, 

M9i = g} = I'" 1 - 

For f with f T G (H T ), 

Pr{ 5l H = f} = 9 - m |Ker(H)| 

= q- r , 

where Ker(H) = {g : gH = 0} and |Ker(H)| = g m_dc ( H ). So for F with (F T ) < (H T ), 

p GH | H (F|H) = Pr{ 9i H = f h i = 1, • • • , s} 

S 

= l[Pr{g l U = i l } 

i=i 

= q- sr - (42) 



Thus, 



PMGH)\ H (sm = q- mr \{F : (F T ) < (H T ),rk(F) = s }| 



= q- mr X r 8 



where |{F : (F T ) < (H T ),rk(F) = s}\ = x r s follows from Lemmad Last, since rk(ff) -> H -> rk(Gff) forms 
a Markov chain, 

Prk(Gi?)|rk(H)(s|r) = ^ p rk(GH) | H (s|H)p ff | rk(ff) (H|r) 
H:rk(H)=r 

= C XI to|rk(H)(H|r) 

H:rk(H)=r 

= C- 

The proof is complete. ■ 
Lemma 12: The number of 7--dimensional subspace in F m is given by the Gaussian binomials. 
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Proof: Define an equivalent relation on .M(F mxr ) by X ~ X' if (X) = (X'}. The equivalent class [X] is the 
set of all matrices that equivalent to X. We have [X] = {X$ : $ E M(¥ rxr )}. Thus |[X]| = |.M(F rxr )| = X r r - 
Since Gr(r,F T ) =M(¥ mxr )/ - the quotient set of M(¥ mxr ) by ~, we have | Gr(r,F T )| = \M(¥ mxr )\/\[X}\ = 

Xr / Xr" ^ 

Lemma 13: For m > r f and r > r', define a set S = {X G F mxr : rk(X) = r'}. Then 

|5| = = x™"- (43) 

Furthermore, 

(44) 

r' 

Proof: The column vectors of X € 5 span an r' -dimensional subspace in a m-dimensional vector space. Let 
{Vl, V2, ■ ■ ■ Vn} be the set of r'-dimensional subspace in a 771-dimensional vector space, where n = (£?)_. Let 
S Vl = {X £ F mxr : (X) = Vi) and the set {SyJ is a partition of 5. By \{S Vl }\ = Xr'- Therefore, 

\S\=n\S Vi \ = W) qX r r ,=X 1 ?' r . (45) 

The equality in ( f44l > follows because both sides are the number of m x r matrices. ■ 

Lemma 14: Let V < F m be a s-dimensional subspace. Then, the number of subspace U with V < U and 

dim(Z7) = r is 

(-7), = (46) 

Proof: Let [/ be a subspace with V < U and dim(t/) = r. Then we can write U — V + U' where U' is a 
dim([/') = r — s and V D 17' = {0}. Given U, such U' is unique. The number of U' is the number of (r — s)- 
dimensional subspace in an (m— s) -dimensional space, i.e., ("'is 5 )^- The equality in d46l l is the direct result of the 
definitions. ■ 

Appendix B 
Useful Results 

Lemma 15: For r < m, — log 2 < 1-8. 
Proof: Define 

00 

5 9 («) = IJ(1 -?-*). (47) 

i— s 

So > 5 9 (m — r + 1). We know 3 9 (s + l) > 5 9 (s) > S g _i(s) > H2(l), where 52(1) is a mathematics constant 
with approximate value 0.28879 EO). Thus - log 2 C" 1 < - log 2 S 2 (l) < - log 2 0.2887 < 1.8. ■ 
Lemma 16: lim T ^oo Ttog^q = r ' 
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Proof: 



lim = i im 

T^ooT\og 2 q T->oc Tlog 2 q 



= lim lim %££L 

T^oo T log 2 Q T^oo T log 2 (7 

= + r. 



Lemma 17: | Pj(F m )| < g ™ 2 /2+iog 5 m +c ^ where Q < Lg is a constant 
Proof: Refer to the proof of Lemma Q3] We have 

|Pj(F m )h 




l 



(m—r)r 



_ ? m 2 /2+log,(m/H,(l)) 

< 9 m 2 /2+log,m+log 2 (l/E: 2 (l))^ 

Let c = log 2 (l/S 2 (l)). By H 2 (l) « 0.28879, we obtain c < 1.8. ■ 

Lemma 18: For V < U < ¥ T and V" < U' < ¥ T with dim(Z7) = dim(Z7') and dim(V) = dim(V), we can 
find $ G Fr(F TxT ) such that $[/ = 17' and $V = V. 

Proof: Find a basis {b; : i = 1, • • • ,T} of F T such that {bi : i = 1, • • • , r} is a basis of U and {bi : i = 
1, • • • , s} is a basis of V. We can do this by first finding a basis of V, extending the basis to a basis of U and 
further extending to a basis of F T . Similarly, find a basis {b£ : i = 1, • • ■ , T} of F T such that {b£ : i = 1, • • ■ , r} 
is a basis of U and {b^ : i = 1, • • • , s} is a basis of V. Consider the linear system of equations 

$b 4 = b^, i = l,...,T. 

We know there exists unique $ 6 Fr(F TxT ) satisfying this linear system and <£>y = V' and ${/ = U'. ■ 
Lemma 79: For X,X' G F TxM , (X T ) = (X' T ) if and only if there exists $ G Fr(F TxT ) such that X' = $X. 
Proof: Let r = rk(X). First, show a) =>• c). Fix one full-rank decomposition X = BD. Since (D T ) = (X T ) = 
(X /T ), we can find a decomposition X' = B'D using the same procedure we described by first fixing D. Second, 
show c) => b). With the decomposition in c), we can find $ e Fr(F TxT ) such that <FB = B'. Extend B and 
B' to T x T matrices [B B ] and [B' B' ]. Then, $ = [B' B' ][B B ] _1 is one such matrix we want since 
$[B B ] = [B' B' ]. Last, we have b) => a). ■ 
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Lemma 20: For U < F* with dim(t/) — r < m, let 

A(m, (J) = {Xe F txm : (X) = U}. 

Then, 

|A(m,f/)|= X ™, 

and for $ e Fr(F* xt ) 

A(m, = [/). 

Proof: Find aixr matrix B with (B) = U. Then, we have 

A(m,U) = {BD : D e Fr(F rxm )} = BFr(F rxm ). 

Thus, \A(m,U)\ = |Fr(F rxm )| = X ?- For $ e Fr(F* xt ), ($B) = $[/. So A(m, $C7) = $BFr(F rxM ) = 
<i>A(m,U). * 
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